WiredTiger Internals
WiredTiger Internals
The storage engine is the heart of a database, responsible for managing how data is stored on disk and handled in memory. Since version 3.2, WiredTiger has been MongoDB’s default storage engine.
🏗️ 1. Core Architecture
WiredTiger’s architecture is designed for high-concurrency workloads and efficient disk I/O. Unlike its predecessor (MMAPv1), WiredTiger provides granular control over data persistence and memory management.
Document-Level Concurrency
One of the most significant advantages of WiredTiger is its support for Document-Level Locking.
- Optimistic Concurrency Control: WiredTiger assumes that most operations will not conflict. It allows multiple clients to update different documents in the same collection simultaneously.
- Conflict Handling: If two operations attempt to modify the same document, one will succeed and the other will retry.
Checkpoints and Data Consistency
Checkpoints provide a consistent view of the data on disk. By default, MongoDB performs a checkpoint every 60 seconds or after 2GB of data is written.
- Persistence: In the event of a crash, MongoDB uses the last successful checkpoint as the starting point for recovery.
- Journaling: To ensure data between checkpoints isn’t lost, MongoDB uses a write-ahead log called the Journal.
🚀 2. The WiredTiger Cache
The WiredTiger cache is the most critical memory area for MongoDB performance. It stores recently used data and indexes in an uncompressed format.
Cache Sizing
By default, MongoDB allocates a significant portion of system RAM to the WiredTiger cache:
- Calculation: 50% of (Total RAM - 1 GB).
- Example: On a system with 16GB RAM, the cache will be approximately 7.5GB.
Monitoring Cache Usage
You can monitor the cache health using the Mongo Shell:
// Check WiredTiger Cache status
db.serverStatus().wiredTiger.cache⚡ 3. Compression and I/O
WiredTiger significantly reduces disk footprint through built-in compression.
| Data Type | Default Compression | Alternative |
|---|---|---|
| Collections | Snappy | Zlib, Zstd |
| Indexes | Prefix Compression | None |
Pymongo Example: Monitoring Storage Stats
You can use Pymongo to programmatically check the storage statistics of a collection.
from pymongo import MongoClient
client = MongoClient('mongodb://localhost:27017/')
db = client['university']
stats = db.command("collstats", "students")
print(f"Total Storage Size (Compressed): {stats['storageSize'] / 1024 / 1024:.2f} MB")
print(f"Logical Data Size (Uncompressed): {stats['size'] / 1024 / 1024:.2f} MB")💡 Best Practices
- Dedicated Servers: Never share a MongoDB server with other RAM-intensive applications (like an API or web server). They will compete for the WiredTiger cache.
- Use SSDs: While the cache is fast, WiredTiger’s checkpointing and eviction processes are I/O intensive. SSDs are mandatory for production workloads.
- Monitor Eviction Rates: High eviction rates (where WiredTiger has to force data out of the cache) indicate that you need more RAM or that your queries are scanning too much data.