Back to roadmaps

MLOps and Production ML

Take machine learning from a notebook to a reliable, monitored, automated production system. Duration twelve weeks. Target outcome: train, version, deploy, serve, monitor, and retrain models with reproducible pipelines.

00 of 48 topics

Overview

MLOps applies software engineering and DevOps discipline to machine learning. This track assumes you can train a basic model and use Python and git. You learn to make experiments reproducible, package and serve models, automate the path from data to deployment, and keep models healthy in production. Build a running pipeline at each stage.

Month 1: Reproducibility and pipelines

Week 1: ML engineering foundations

0 of 5
Tools and libraries
  • uv or conda: environments
  • ruff and mypy: quality
  • pydantic: config validation
Build
  • Convert a notebook model into a clean, configurable, tested Python project

Week 2: Data and model versioning

0 of 4
Tools and libraries
  • DVC: data and pipeline versioning
  • Git LFS for large files when needed
Build
  • Version a dataset and a trained model so any run is reproducible from a commit

Week 3: Experiment tracking

0 of 4
Tools and libraries
  • MLflow: tracking and registry
  • Weights and Biases: tracking and dashboards
Build
  • Track every training run, log metrics and artifacts, and register the best model

Week 4: Pipelines and orchestration

0 of 4
Tools and libraries
  • Apache Airflow or Prefect: orchestration
  • Great Expectations or pandera: data validation
Build
  • An orchestrated pipeline that ingests data, validates it, trains, evaluates, and registers a model

Month 2: Serving and automation

Week 5: Model packaging and serving

0 of 5
Tools and libraries
  • FastAPI: serving API
  • BentoML or a custom container: packaging
  • Docker: containerize
Build
  • Serve your registered model behind a FastAPI endpoint in a container

Week 6: CI and CD for ML

0 of 4
Tools and libraries
  • GitHub Actions: pipelines
  • the model registry as the promotion gate
Build
  • A pipeline that tests, trains, evaluates against a threshold, and only then promotes and deploys the model

Week 7: Scalable serving and inference

0 of 4
Tools and libraries
  • KServe or Seldon: model serving on Kubernetes
  • vLLM or Triton: high throughput inference for large models
Build
  • Deploy the model service with autoscaling and a canary rollout

Week 8: Feature stores and data freshness

0 of 4
Tools and libraries
  • Feast: open source feature store
Build
  • Move features into a feature store and serve them consistently online and offline

Month 3: Monitoring and LLMOps

Week 9: Monitoring and drift

0 of 4
Tools and libraries
  • Evidently: drift and quality monitoring
  • Prometheus and Grafana: service metrics
Build
  • Add drift detection and service monitoring, and trigger a retrain when drift crosses a threshold

Week 10: LLMOps

0 of 5
Tools and libraries
  • Langfuse: LLM observability
  • promptfoo or deepeval: LLM evaluation
Build
  • Add tracing, cost tracking, and an evaluation harness to an LLM feature

Week 11: Governance and reliability

0 of 5
Tools and libraries
  • a registry with lineage
  • presidio: PII detection
Build
  • Add a model card, lineage, and a rollback procedure to your service

Week 12: Capstone

Build
  • An end to end MLOps system: versioned data and models, tracked experiments, an orchestrated training pipeline with validation gates, automated CI and CD, a scalable served model, a feature store, drift monitoring with automated retraining, and full documentation

Resource master reference

Books

Designing Machine Learning Systems by Chip Huyen

Machine Learning Engineering by Andriy Burkov

Reliable Machine Learning by Cathy Chen and others

Repositories

awesome mlops curated list

made with ml by Goku Mohandas

Tools master list

uv, DVC, MLflow, Weights and Biases, Airflow, Prefect, Great Expectations, FastAPI, BentoML, Docker, KServe, Seldon, vLLM, Triton, Feast, Evidently, Langfuse, Prometheus, Grafana

Interview focus

Walk through taking a model from notebook to production

How do you detect and respond to data drift

Explain train and serve skew and how a feature store fixes it

Design a continuous training pipeline with safe promotion

How do you monitor and evaluate an LLM feature in production

Trade offs of batch scoring versus real time serving