Back to roadmaps

Data Science and Analytics

From data wrangling and statistics to machine learning, analytics, and clear communication of insight. Duration sixteen weeks. Target outcome: take a raw dataset to a validated model or a decision ready analysis with a dashboard.

00 of 55 topics

Overview

This track builds the full data science and analytics skill set. It assumes basic Python. You move from data handling and statistics through visualization, classical machine learning, and analytics engineering, ending with storytelling and a capstone. Ship an analysis or model at every stage.

Month 1: Data and statistics

Week 1: Python for data

0 of 4
Tools and libraries
  • NumPy, pandas, polars, pyarrow
Build
  • Load a messy real dataset, clean it, and produce a tidy table

Week 2: Statistics foundations

0 of 5
Tools and libraries
  • scipy.stats, statsmodels
Build
  • A statistical summary report of a dataset with confidence intervals

Week 3: Inferential statistics and experiments

0 of 5
Tools and libraries
  • statsmodels, scipy
Build
  • Design and analyze a simulated A and B test and report whether the change is significant

Week 4: Exploratory data analysis and visualization

0 of 4
Tools and libraries
  • matplotlib, seaborn, plotly
  • ydata profiling for quick EDA
Build
  • A full EDA notebook with clear visuals and written findings

Month 2: Machine learning

Week 5: Data preparation and feature engineering

0 of 5
Tools and libraries
  • scikit-learn pipelines, category encoders
Build
  • A reusable preprocessing pipeline that prevents leakage

Week 6: Supervised learning

0 of 5
Tools and libraries
  • scikit-learn, xgboost, lightgbm
Build
  • A tabular prediction model with a baseline and a boosted model compared fairly

Week 7: Model evaluation and tuning

0 of 5
Tools and libraries
  • scikit-learn, optuna, shap
Build
  • Tune your model, evaluate it properly, and produce an explainability report

Week 8: Unsupervised learning

0 of 4
Tools and libraries
  • scikit-learn, umap
Build
  • Segment a dataset with clustering and describe each segment

Month 3: Analytics engineering

Week 9: SQL for analytics

0 of 4
Tools and libraries
  • PostgreSQL or DuckDB
Build
  • Answer five business questions on a real schema using only SQL

Week 10: Analytics engineering and modeling

0 of 4
Tools and libraries
  • dbt, DuckDB or a warehouse
Build
  • Model raw data into clean marts with dbt, with tests

Week 11: Dashboards and BI

0 of 4
Tools and libraries
  • Metabase or Apache Superset, both free
Build
  • A dashboard answering the key questions for a chosen domain

Week 12: Communication and capstone prep

0 of 4
Build
  • A written analysis with a recommendation and supporting visuals

Month 4: Capstone and specialization

Week 13 to 16: Capstone

0 of 2
Tools and libraries
  • prophet or statsmodels for time series
  • surprise or implicit for recommendation
Build
  • A portfolio grade project: a clear question, a validated model or analysis, a dashboard, and a written report with a recommendation

Resource master reference

Books

Practical Statistics for Data Scientists by Bruce and Bruce

Hands On Machine Learning by Aurelien Geron

Storytelling with Data by Cole Nussbaumer Knaflic

Repositories

awesome data science curated list

the dbt and Metabase docs

Tools master list

NumPy, pandas, polars, scipy, statsmodels, matplotlib, seaborn, plotly, scikit-learn, xgboost, lightgbm, optuna, shap, umap, PostgreSQL, DuckDB, dbt, Metabase, Superset, prophet

Interview focus

Explain bias and variance and how you diagnose each

How do you design and analyze an A and B test

When do you use precision versus recall

Write a SQL query with a window function for a running metric

How do you prevent data leakage

Walk through a project from question to recommendation