Skip to content

Feature Stores (Enterprise MLOps)

🏗️ Feature Stores: The Enterprise Backbone

In a large organization, data scientists often rewrite the same feature logic (e.g., “avg_spend_30d”). A Feature Store ensures everyone uses the same logic and that data is consistent between training and inference.


🟢 Level 1: Core Components

A Feature Store (like Feast or Hopsworks) has two sides:

1. The Offline Store

  • Storage: Data Lake (Parquet, BigQuery, Snowflake).
  • Use Case: Training models on large historical datasets.
  • Key Feature: Point-in-Time Correctness (prevents “data leakage” by ensuring features are only joined if they existed before the prediction time).

2. The Online Store

  • Storage: Low-latency DB (Redis, DynamoDB, Cassandra).
  • Use Case: Real-time inference.
  • Goal: Fetch a user’s latest features in < 10ms.

🟡 Level 2: The Feature Registry

3. Feature Definitions (Infrastructure as Code)

Features are defined as Python classes or YAML files.

from feast import Entity, FeatureView, Field
from feast.types import Int64

# Define a User Entity
user = Entity(name="user", join_keys=["user_id"])

# Define a Feature View (Logic)
user_stats = FeatureView(
    name="user_stats",
    entities=[user],
    schema=[
        Field(name="avg_spend", dtype=Int64),
        Field(name="last_login", dtype=Int64),
    ],
    source=source # S3 or DB source
)

🔴 Level 3: Advanced Governance

4. Feature Lineage

Tracking which models use which features. If a source database column changes, the Feature Store knows which models will break.

5. Cost Management

Computing features for 1 billion users every hour is expensive. Feature stores allow for:

  • Materialization: Caching features to save compute.
  • TTL (Time to Live): Deleting stale features from the online store.