Feature Stores (Enterprise MLOps)
🏗️ Feature Stores: The Enterprise Backbone
In a large organization, data scientists often rewrite the same feature logic (e.g., “avg_spend_30d”). A Feature Store ensures everyone uses the same logic and that data is consistent between training and inference.
🟢 Level 1: Core Components
A Feature Store (like Feast or Hopsworks) has two sides:
1. The Offline Store
- Storage: Data Lake (Parquet, BigQuery, Snowflake).
- Use Case: Training models on large historical datasets.
- Key Feature: Point-in-Time Correctness (prevents “data leakage” by ensuring features are only joined if they existed before the prediction time).
2. The Online Store
- Storage: Low-latency DB (Redis, DynamoDB, Cassandra).
- Use Case: Real-time inference.
- Goal: Fetch a user’s latest features in < 10ms.
🟡 Level 2: The Feature Registry
3. Feature Definitions (Infrastructure as Code)
Features are defined as Python classes or YAML files.
from feast import Entity, FeatureView, Field
from feast.types import Int64
# Define a User Entity
user = Entity(name="user", join_keys=["user_id"])
# Define a Feature View (Logic)
user_stats = FeatureView(
name="user_stats",
entities=[user],
schema=[
Field(name="avg_spend", dtype=Int64),
Field(name="last_login", dtype=Int64),
],
source=source # S3 or DB source
)🔴 Level 3: Advanced Governance
4. Feature Lineage
Tracking which models use which features. If a source database column changes, the Feature Store knows which models will break.
5. Cost Management
Computing features for 1 billion users every hour is expensive. Feature stores allow for:
- Materialization: Caching features to save compute.
- TTL (Time to Live): Deleting stale features from the online store.