Blog post: Tuning Meteor Mongo Livedata for Scalability 

New performance tuning parameters have shipped in 1.3


One of the most frequent questions I am asked about Meteor is “how does it scale?” The answer is always the same: “it depends on your data access patterns.” In this article, I will explain the various modes Meteor has for running realtime mongo queries, how to pick which is right for your data access pattern, and the new parameters available in Meteor 1.3 to tune the system for maximum performance. 

( sidebar )
TL;DR — I just want to know the new options
  • Use the  disableOplog: true option to collection.find() (http://docs.meteor.com/#/full/find) to opt-out of oplog tailing on a per-query basis.
  • Use the pollingInterval  and pollingThrottle options to the same function to adjust how often the poll-and-diff driver queries the database.
( / sidebar )

How does Meteor turn mongo into a realtime database?

Before getting to how to tune the data system, it is important to understand how Meteor queries MongoDB and provides a realtime publish/subscribe API to clients on top of a a database that doesn’t have any built-in realtime features.

There are two main modes of operation: poll-and-diff and oplog-tailing. Poll and diff is the simplest, can be used on any mongo database, and works well for single-server operation. Oplog tailing is more complicated, and provides real-time updates across multiple servers.

Poll and diff
The poll-and-diff driver works by repeatedly running your query (polling) and computing the difference between new and old results (diffing). The server will re-run the query every time another client on the same server does a write that could affect the results. It will also re-run periodically to pick up changes from other servers or external processes modifying the database . Thus poll-and-diff can deliver realtime results for clients connected to the same server, but it introduces noticeable lag (the default is 10 seconds, see below for more on this parameter) for external writes. This may or may not be detrimental to the application UX, depending on the application (eg, bad for chat, fine for todos). 

This approach is simple and and delivers easy to understand scaling characteristics. However, it does not scale well with lots of users and lots of data. Because each change causes all results to be refetched, CPU time and network bandwidth scale O(N^2) with users. Meteor automatically de-duplicates identical queries, though, so if each user does the same query the results can be shared. 

Oplog tailing
Oplog tailing — first introduced in Meteor 0.7 — works by reading the mongo database’s replication log that it uses to synchronize secondary databases (the ‘oplog’). This allows Meteor to deliver realtime updates across multiple hosts and scale horizontally.

The oplog is global per database cluster and requires admin permissions to access, so many users never set it up and rely on poll-and-diff for realtime updates. To enable oplog tailing, pass the MONGO_OPLOG_URL environment variable to the meteor process. When this environment variable is passed, Meteor defaults all queries to use oplog tailing. Before Meteor 1.3, this was all or nothing — new in 1.3 is the disableOplog option to collection.find()  that allows tuning this on a per-query basis.

Since each meteor process must read the whole database log, it is desirable to have fewer larger servers instead of more smaller servers when using oplog tailing. This minimizes the amount of duplicated work and saves CPU and and network bandwidth with mongo.

Additionally, the oplog driver sometimes needs to re-fetch items from the database. These are documents that are not currently in the results, but could be as a result of an incoming modification. When it does so, it does a fetch on the document by _id, which is guaranteed to be indexed, but still can result in large loads on the database in some cases. Complex queries, especially those with limit or sort parameters may be more efficient with poll-and-diff. See below on how to disable oplog tailing on a per-query basis.

Tuning for your data access patterns

As you can see, each method has its tradeoffs. In Meteor 1.3, you can switch between them on a per-query basis. So how do you decide which one to use for a particular query?

The first major choice you have is do you want to use oplog tailing for any of your queries. If you are on a shared Mongo database and do not have admin access,  oplog tailing is not an option. If you have a database with a very high write rate enabling oplog tailing might not be a good idea — whether or not you query a particular collection, Meteor must read the whole oplog if you have even one oplog driven query.

As a general guideline, it’s usually best to start with oplog tailing if at all possible, and selectively disable it on problematic queries. To enable oplog tailing in production, pass the MONGO_OPLOG_URL environment variable to Meteor at process start time (See https://themeteorchef.com/snippets/setting-up-mongodb-oplog-tailing/ for more details on setting up your oplog).

If you can’t turn on oplog tailing for whatever reason, you can skip the next section and go straight to tuning poll-and-diff.

Tuning oplog tailing
There isn’t actually much tuning to be done about how the oplog driver operates. Oplog tailing works in most use cases, but there are a few different ways it can go wrong when scaling. One of the most common tasks for making a Meteor app scale is to disable oplog tailing on particularly troublesome queries.

To disable oplog tailing on a per-query basis, add the option disableOplog: true to a
 collection.find() call in a publish function on the server (or any other  server side  observe call).

There are a few different ways oplog tailing can end up as a scaling bottleneck:
  1. Large bulk updates
  1. High write rate on individual documents (“hot spots”)
  1. Queries that cause the oplog driver to fetch many documents by _id