Storage engine and concurrency in mongodb
Created on: Sep 5, 2024
Mongodb supports multiple storage engine, allowing you to choose the one that best fits your needs for performance, data durability, scalability, and encryption. Below are three main storage engine commonly used.
- WiredTiger Storage Engine (Default)
- Encrypted Storage Engine
- In-Memory Storage
Let's explore the features of each.
1. WiredTiger Storage Engine (Default)
- It is the default storage engine. You can check the engine used in mongodb using below command.
db.serverStatus().storageEngine
-
Document Level Concurrency: WiredTiger uses document-level concurrency control for write operations. As a result, multiple clients can modify different documents of a collection at the same time. WiredTiger uses optimistic concurrency control which assumes that conflicts are rare, and it checks for conflicts only at the commit phase of a transaction.
WiredTiger uses only intent locks at the global, database and collection levels. When the storage engine detects conflicts between two operations, one will incur a write conflict causing MongoDB to transparently retry that operation.
There are four modes of lock.
- Shared (S) lock
- Exclusive (X) lock
- Intent Shared (IS) lock
- Intent Exclusive (IX) lock.
Let's take a example and understand how concurrency works. Suppose we have a collection
account
and there are three operation which will happens simultaneously by different client.db.account.insertMany([ { _id: 1, name: "Alice", balance: 100 }, { _id: 2, name: "Bob", balance: 150 } ]);
- Client A: Update balance in the document with
_id: 1
- Client B: Update balance in the document with
_id: 2
to 200 - Client C: Read balance with
_id: 1
- Client D: Update balance in the document with
_id: 2
to 300
There is no deterministic order for which client acquires the lock first.
Let's take case of Client A, Client A acquires Intent exclusive(IX) lock on
account
collection to indicate that it will update document within this collection. It then acquire exclusive lock on document with_id:1
, granting it exclusive access to modify the document. Since only one client is updating_id:1
, so there is no conflict.Let's suppose Client B acquire Intent exclusive(IX) on
account
collection and exclusive (X) on document with_id: 1
, locking the document for exclusive access. Now Client C, D will have to wait until the document is unlocked. After updating balance, this document will be unlocked.Similarly let's suppose client C get the lock, it will acquire IS lock on collection and S lock on
_id: 1
and do read operation. Note that multiple client can also apply S lock simultaneously, but no client can apply W lock if S lock is present.After all the S lock is freed, client D can do its operation.
Note:
- Multiple IX locks can coexist on the same collection since they only signal the intent to modify different documents within the collection.
- Exclusive (X) lock on a document does not interfere with the X lock on any document in same collection or different.
-
Data Compression: This technique is used to reduce the size of stored data on disk, thus optimizing storage and potentially improving read performance by reducing the amount of data that needs to be transferred from disk to memory. With WiredTiger, MongoDB supports compression for all collections and indexes.
By default, WiredTiger uses block compression with the snappy compression library for all collections and prefix compression for all indexes.
-
Cache Management: With WiredTiger, MongoDB utilizes both the WiredTiger internal cache and the filesystem cache. The default WiredTiger internal cache size is the larger of either:
1. 50% of (RAM - 1 GB), or 2. 256 MB
The main purpose of caching is to store frequently accessed data in memory, making reads and writes faster and more efficient.
-
Journal and checkpoints: WiredTiger uses a write-ahead log (i.e. journal) in combination with checkpoints to ensure data durability. The Write-Ahead Log (WAL) is a method used by databases to ensure durability and consistency by logging changes before they are applied to the main data files.
WiredTiger uses MultiVersion Concurrency Control (MVCC). At the start of an operation, WiredTiger provides a point-in-time snapshot of the data to the operation.
A snapshot to a point-in-time, consistent view of the in-memory data.When writing to disk, WiredTiger writes all the data in a snapshot to disk in a consistent way across all data files.
2. Encrypted Storage Engine:
Encrypted Storage Engine is the enhancement of WiredTiger which support AES256-CBC ( or 256-bit Advanced Encryption Standard in Cipher Block Chaining mode) via OpenSSL ). It is only available in MongoDB Enterprise.
3. In-Memory Storage Engine
The In-Memory storage engine is designed for applications that require ultra-low latency and can afford to keep the entire dataset in memory. It eliminates the need for disk I/O by keeping all data in RAM.