MongoDB - resizing the oplog of replica sets

03.07.2016 3 Min. lesen

MongoDB replica sets are used to provide redundancy for MongoDB. For the replication of the data MongoDB uses the so called oplog. This oplog is a separate collection within the local database. It is capped, so the collection cannot get bigger than the configured size. This limits the amount of entries that can be stored within the collection. If the collection is full, the oldest entries will be removed. When an application fills the oplog within 35 minutes the replica set will get out of sync after 34 minutes and 59 seconds in case of a disaster event. If this happens all one can do, is to sync the node by hand (e.g. copying the data directory, triggering a full sync later when the load of the system drops to an acceptable dimension).

To avoid this situation one can check the actual size of the oplog with the command:

db.getReplicationInfo()

The output of this command gives insight into the size and the time difference between the youngest and oldest entry of the oplog.

db:SECONDARY> db.getReplicationInfo()
{
"logSizeMB" : 12288,
"usedMB" : 12268.66,
"timeDiff" : 49881,
"timeDiffHours" : 13.86,
"tFirst" : "Mon May 30 2016 20:50:05 GMT+0200 (CEST)",
"tLast" : "Tue May 31 2016 10:41:26 GMT+0200 (CEST)",
"now" : "Tue May 31 2016 10:41:26 GMT+0200 (CEST)"
}

The sample shows that the oplog of the replica set above is able to represent nearly 14 hours of operations. In case a node goes down at 22pm we would be able to recover itself until 12am.

In case your application has a stable load behavior and 14 hours of ‘recover from failure’ time are sufficient for your use case you’re done here. Whereas your application creates different amounts of data to different times within the day, the time difference represented by the oplog may change accordingly. The example above will only allow a difference of 3 hours while under maximum load. So a recommendation would be to monitor the oplog size over the day. If the time represented by oplog gets to small (depending on your requirements) you’ll need to increase it’s size. This is a manual process, which is described in the official documentation. Alternatively you can use the following script (that is based on the tutorial) in an round robin approach on every server:

The script combines steps from the tutorial and reqires an explicit input of ‘y’ followed by the return key for the sake of verification of each step and to evaluate the possible consequences. Every different input stops the execution of the script. Be aware that no recovery nor any resume functionality is included. So cleanup, recovery and restart needs to be done by hand.

This article is a translation from german of my original article posted here.