Advanced Schema Design Patterns

Beyond the basics of embedding and referencing, several industry-standard patterns help solve complex modeling challenges in MongoDB.

🎨 1. The Polymorphic Pattern

The Polymorphic Pattern is used when documents in a collection have similar but slightly different structures. This is common in Content Management Systems (CMS) or Asset Management.

Use Case

A “Product” catalog where different categories have different attributes (e.g., a “Shirt” has a size, while a “Laptop” has a CPU model).

// Product Collection: Polymorphic Structure
[
  {
    "type": "clothing",
    "name": "T-Shirt",
    "attributes": { "size": "L", "color": "Red" }
  },
  {
    "type": "electronics",
    "name": "Laptop",
    "attributes": { "cpu": "i7", "ram": "16GB" }
  }
]

🏷️ 2. The Attribute Pattern

The Attribute Pattern is ideal for situations where you have many similar fields and want to index them efficiently without knowing all possible keys beforehand.

Use Case

Searchable product specifications. Instead of having fields like cpu, ram, and gpu, you store them as key-value pairs in an array.

// Without Attribute Pattern (Poor Indexing)
{
  "specs": {
    "cpu": "i7",
    "ram": "16GB",
    "gpu": "RTX 3080"
  }
}

// With Attribute Pattern (Easy Indexing)
{
  "specs": [
    { "k": "cpu", "v": "i7" },
    { "k": "ram", "v": "16GB" },
    { "k": "gpu", "v": "RTX 3080" }
  ]
}

Why?

You can create a single Multi-key Index on specs.k and specs.v to support arbitrary searches across all specifications without adding a new index for every field.

🪣 3. The Bucket Pattern

The Bucket Pattern is standard for Time-Series data or high-frequency event logging. It reduces the number of documents and the index size by “bucketing” related data into a single document.

Use Case: IoT Sensor Data

Instead of creating 1 million documents for every 1-second sensor reading, you create one document per hour containing an array of 3,600 readings.

// Sensor Collection: Hourly Bucket
{
  "sensor_id": 101,
  "day": "2023-10-01",
  "hour": 14,
  "readings": [
    { "time": "14:00:01", "temp": 22.5 },
    { "time": "14:00:02", "temp": 22.6 }
    // ... up to 3600 readings
  ],
  "avg_temp": 22.8 # Pre-aggregated data
}

Advantages

Reduced Index Size: 3,600 documents become 1 document.
Improved Performance: Reading an entire hour of data requires only one index seek.
Easy Pre-aggregation: You can update avg_temp or max_temp during the insert for fast reporting.

🧪 4. Implementing the Bucket Pattern in Pymongo

from pymongo import MongoClient
from datetime import datetime

client = MongoClient("mongodb://localhost:27017")
db = client.iot_db
collection = db.sensor_buckets

# Upsert: Add a reading to an existing bucket or create a new one
reading = {"t": datetime.utcnow(), "v": 24.1}

result = collection.update_one(
    {
        "sensor_id": 101,
        "bucket_start": "2023-10-01T14:00" # Unique key for the bucket
    },
    {
        "$push": {"readings": reading},
        "$inc": {"count": 1, "sum": 24.1}, # Keep stats for easy avg
        "$setOnInsert": {"sensor_id": 101, "bucket_start": "2023-10-01T14:00"}
    },
    upsert=True
)

Summary

Advanced patterns like Polymorphic, Attribute, and Bucket allow you to handle diversity, searchability, and high-frequency data at scale. Choose your pattern based on the specific bottleneck you are trying to solve.