Skip to content

WiredTiger Internals

WiredTiger Internals

The storage engine is the heart of a database, responsible for managing how data is stored on disk and handled in memory. Since version 3.2, WiredTiger has been MongoDB’s default storage engine.

🏗️ 1. Core Architecture

WiredTiger’s architecture is designed for high-concurrency workloads and efficient disk I/O. Unlike its predecessor (MMAPv1), WiredTiger provides granular control over data persistence and memory management.

Document-Level Concurrency

One of the most significant advantages of WiredTiger is its support for Document-Level Locking.

  • Optimistic Concurrency Control: WiredTiger assumes that most operations will not conflict. It allows multiple clients to update different documents in the same collection simultaneously.
  • Conflict Handling: If two operations attempt to modify the same document, one will succeed and the other will retry.

Checkpoints and Data Consistency

Checkpoints provide a consistent view of the data on disk. By default, MongoDB performs a checkpoint every 60 seconds or after 2GB of data is written.

  • Persistence: In the event of a crash, MongoDB uses the last successful checkpoint as the starting point for recovery.
  • Journaling: To ensure data between checkpoints isn’t lost, MongoDB uses a write-ahead log called the Journal.

🚀 2. The WiredTiger Cache

The WiredTiger cache is the most critical memory area for MongoDB performance. It stores recently used data and indexes in an uncompressed format.

Cache Sizing

By default, MongoDB allocates a significant portion of system RAM to the WiredTiger cache:

  • Calculation: 50% of (Total RAM - 1 GB).
  • Example: On a system with 16GB RAM, the cache will be approximately 7.5GB.

Monitoring Cache Usage

You can monitor the cache health using the Mongo Shell:

// Check WiredTiger Cache status
db.serverStatus().wiredTiger.cache

⚡ 3. Compression and I/O

WiredTiger significantly reduces disk footprint through built-in compression.

Data TypeDefault CompressionAlternative
CollectionsSnappyZlib, Zstd
IndexesPrefix CompressionNone

Pymongo Example: Monitoring Storage Stats

You can use Pymongo to programmatically check the storage statistics of a collection.

from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
db = client['university']
stats = db.command("collstats", "students")

print(f"Total Storage Size (Compressed): {stats['storageSize'] / 1024 / 1024:.2f} MB")
print(f"Logical Data Size (Uncompressed): {stats['size'] / 1024 / 1024:.2f} MB")

💡 Best Practices

  1. Dedicated Servers: Never share a MongoDB server with other RAM-intensive applications (like an API or web server). They will compete for the WiredTiger cache.
  2. Use SSDs: While the cache is fast, WiredTiger’s checkpointing and eviction processes are I/O intensive. SSDs are mandatory for production workloads.
  3. Monitor Eviction Rates: High eviction rates (where WiredTiger has to force data out of the cache) indicate that you need more RAM or that your queries are scanning too much data.