Advanced Schema Design Patterns
Advanced Schema Design Patterns
Beyond the basics of embedding and referencing, several industry-standard patterns help solve complex modeling challenges in MongoDB.
🎨 1. The Polymorphic Pattern
The Polymorphic Pattern is used when documents in a collection have similar but slightly different structures. This is common in Content Management Systems (CMS) or Asset Management.
Use Case
A “Product” catalog where different categories have different attributes (e.g., a “Shirt” has a size, while a “Laptop” has a CPU model).
// Product Collection: Polymorphic Structure
[
{
"type": "clothing",
"name": "T-Shirt",
"attributes": { "size": "L", "color": "Red" }
},
{
"type": "electronics",
"name": "Laptop",
"attributes": { "cpu": "i7", "ram": "16GB" }
}
]🏷️ 2. The Attribute Pattern
The Attribute Pattern is ideal for situations where you have many similar fields and want to index them efficiently without knowing all possible keys beforehand.
Use Case
Searchable product specifications. Instead of having fields like cpu, ram, and gpu, you store them as key-value pairs in an array.
// Without Attribute Pattern (Poor Indexing)
{
"specs": {
"cpu": "i7",
"ram": "16GB",
"gpu": "RTX 3080"
}
}
// With Attribute Pattern (Easy Indexing)
{
"specs": [
{ "k": "cpu", "v": "i7" },
{ "k": "ram", "v": "16GB" },
{ "k": "gpu", "v": "RTX 3080" }
]
}Why?
You can create a single Multi-key Index on specs.k and specs.v to support arbitrary searches across all specifications without adding a new index for every field.
🪣 3. The Bucket Pattern
The Bucket Pattern is standard for Time-Series data or high-frequency event logging. It reduces the number of documents and the index size by “bucketing” related data into a single document.
Use Case: IoT Sensor Data
Instead of creating 1 million documents for every 1-second sensor reading, you create one document per hour containing an array of 3,600 readings.
// Sensor Collection: Hourly Bucket
{
"sensor_id": 101,
"day": "2023-10-01",
"hour": 14,
"readings": [
{ "time": "14:00:01", "temp": 22.5 },
{ "time": "14:00:02", "temp": 22.6 }
// ... up to 3600 readings
],
"avg_temp": 22.8 # Pre-aggregated data
}Advantages
- Reduced Index Size: 3,600 documents become 1 document.
- Improved Performance: Reading an entire hour of data requires only one index seek.
- Easy Pre-aggregation: You can update
avg_tempormax_tempduring the insert for fast reporting.
🧪 4. Implementing the Bucket Pattern in Pymongo
from pymongo import MongoClient
from datetime import datetime
client = MongoClient("mongodb://localhost:27017")
db = client.iot_db
collection = db.sensor_buckets
# Upsert: Add a reading to an existing bucket or create a new one
reading = {"t": datetime.utcnow(), "v": 24.1}
result = collection.update_one(
{
"sensor_id": 101,
"bucket_start": "2023-10-01T14:00" # Unique key for the bucket
},
{
"$push": {"readings": reading},
"$inc": {"count": 1, "sum": 24.1}, # Keep stats for easy avg
"$setOnInsert": {"sensor_id": 101, "bucket_start": "2023-10-01T14:00"}
},
upsert=True
)Summary
Advanced patterns like Polymorphic, Attribute, and Bucket allow you to handle diversity, searchability, and high-frequency data at scale. Choose your pattern based on the specific bottleneck you are trying to solve.