Data Science and Analytics
From data wrangling and statistics to machine learning, analytics, and clear communication of insight. Duration sixteen weeks. Target outcome: take a raw dataset to a validated model or a decision ready analysis with a dashboard.
Overview
This track builds the full data science and analytics skill set. It assumes basic Python. You move from data handling and statistics through visualization, classical machine learning, and analytics engineering, ending with storytelling and a capstone. Ship an analysis or model at every stage.
Month 1: Data and statistics
Week 1: Python for data
0 of 4- NumPy, pandas, polars, pyarrow
- Load a messy real dataset, clean it, and produce a tidy table
Week 2: Statistics foundations
0 of 5- scipy.stats, statsmodels
- A statistical summary report of a dataset with confidence intervals
Week 3: Inferential statistics and experiments
0 of 5- statsmodels, scipy
- Design and analyze a simulated A and B test and report whether the change is significant
Week 4: Exploratory data analysis and visualization
0 of 4- matplotlib, seaborn, plotly
- ydata profiling for quick EDA
- A full EDA notebook with clear visuals and written findings
Month 2: Machine learning
Week 5: Data preparation and feature engineering
0 of 5- scikit-learn pipelines, category encoders
- A reusable preprocessing pipeline that prevents leakage
Week 6: Supervised learning
0 of 5- scikit-learn, xgboost, lightgbm
- A tabular prediction model with a baseline and a boosted model compared fairly
Week 7: Model evaluation and tuning
0 of 5- scikit-learn, optuna, shap
- Tune your model, evaluate it properly, and produce an explainability report
Week 8: Unsupervised learning
0 of 4- scikit-learn, umap
- Segment a dataset with clustering and describe each segment
Month 3: Analytics engineering
Week 9: SQL for analytics
0 of 4- PostgreSQL or DuckDB
- Answer five business questions on a real schema using only SQL
Week 10: Analytics engineering and modeling
0 of 4- dbt, DuckDB or a warehouse
- Model raw data into clean marts with dbt, with tests
Week 11: Dashboards and BI
0 of 4- Metabase or Apache Superset, both free
- A dashboard answering the key questions for a chosen domain
Week 12: Communication and capstone prep
0 of 4- A written analysis with a recommendation and supporting visuals
Month 4: Capstone and specialization
Week 13 to 16: Capstone
0 of 2- prophet or statsmodels for time series
- surprise or implicit for recommendation
- A portfolio grade project: a clear question, a validated model or analysis, a dashboard, and a written report with a recommendation
Resource master reference
Books
Practical Statistics for Data Scientists by Bruce and Bruce
Hands On Machine Learning by Aurelien Geron
Storytelling with Data by Cole Nussbaumer Knaflic
Repositories
awesome data science curated list
the dbt and Metabase docs
Tools master list
NumPy, pandas, polars, scipy, statsmodels, matplotlib, seaborn, plotly, scikit-learn, xgboost, lightgbm, optuna, shap, umap, PostgreSQL, DuckDB, dbt, Metabase, Superset, prophet
Interview focus
Explain bias and variance and how you diagnose each
How do you design and analyze an A and B test
When do you use precision versus recall
Write a SQL query with a window function for a running metric
How do you prevent data leakage
Walk through a project from question to recommendation