Coders {cookies} · Open-source skills, free for everyone

Overview

This track builds the full data science and analytics skill set. It assumes basic Python. You move from data handling and statistics through visualization, classical machine learning, and analytics engineering, ending with storytelling and a capstone. Ship an analysis or model at every stage.

Month 1: Data and statistics

Week 1: Python for data

0 of 4

NumPy arrays and vectorized operations
pandas: load, filter, group, merge, pivot, reshape
polars for fast and lazy data frames
Reading CSV, Excel, JSON, and Parquet

Tools and libraries

NumPy, pandas, polars, pyarrow

Build

Load a messy real dataset, clean it, and produce a tidy table

Week 2: Statistics foundations

0 of 5

Descriptive statistics: mean, median, variance, percentiles
Distributions: normal, binomial, Poisson
Probability, conditional probability, Bayes theorem
Sampling, central limit theorem, confidence intervals
Correlation versus causation

Tools and libraries

scipy.stats, statsmodels

Build

A statistical summary report of a dataset with confidence intervals

Week 3: Inferential statistics and experiments

0 of 5

Hypothesis testing, p values, and significance
t tests, chi square, ANOVA
A and B testing design and analysis
Power and sample size
Common pitfalls and multiple comparisons

Tools and libraries

statsmodels, scipy

Build

Design and analyze a simulated A and B test and report whether the change is significant

Week 4: Exploratory data analysis and visualization

0 of 4

Univariate, bivariate, and multivariate exploration
Missing data patterns and outliers
Visualization grammar and chart selection
Effective color and labeling for clarity

Tools and libraries

matplotlib, seaborn, plotly
ydata profiling for quick EDA

Build

A full EDA notebook with clear visuals and written findings

Month 2: Machine learning

Week 5: Data preparation and feature engineering

0 of 5

Imputation strategies for missing values
Encoding categoricals, scaling, and binning
Feature engineering and selection
Train, validation, and test splits, and leakage prevention
Pipelines for reproducible preprocessing

Tools and libraries

scikit-learn pipelines, category encoders

Build

A reusable preprocessing pipeline that prevents leakage

Week 6: Supervised learning

0 of 5

Linear and logistic regression
Decision trees and random forests
Gradient boosting with XGBoost and LightGBM
Bias and variance, overfitting and underfitting
Regularization

Tools and libraries

scikit-learn, xgboost, lightgbm

Build

A tabular prediction model with a baseline and a boosted model compared fairly

Week 7: Model evaluation and tuning

0 of 5

Classification metrics: accuracy, precision, recall, F1, ROC and PR curves
Regression metrics: MAE, RMSE, R squared
Cross validation and nested cross validation
Hyperparameter tuning
Explainability with feature importance and SHAP

Tools and libraries

scikit-learn, optuna, shap

Build

Tune your model, evaluate it properly, and produce an explainability report

Week 8: Unsupervised learning

0 of 4

Clustering: k means, hierarchical, DBSCAN
Dimensionality reduction: PCA, t SNE, UMAP
Anomaly detection
Choosing the number of clusters

Tools and libraries

scikit-learn, umap

Build

Segment a dataset with clustering and describe each segment

Month 3: Analytics engineering

Week 9: SQL for analytics

0 of 4

Joins, aggregation, window functions, and CTEs
Cohort analysis and funnels
Running totals and period over period comparisons
Query performance basics

Tools and libraries

PostgreSQL or DuckDB

Build

Answer five business questions on a real schema using only SQL

Week 10: Analytics engineering and modeling

0 of 4

The modern data stack overview
Dimensional modeling: facts and dimensions, star schema
Slowly changing dimensions
Transformations and testing with dbt

Tools and libraries

dbt, DuckDB or a warehouse

Build

Model raw data into clean marts with dbt, with tests

Week 11: Dashboards and BI

0 of 4

Metric definitions and a single source of truth
Dashboard design and chart selection
Self service analytics and drill down
Storytelling with data

Tools and libraries

Metabase or Apache Superset, both free

Build

A dashboard answering the key questions for a chosen domain

Week 12: Communication and capstone prep

0 of 4

Structuring an analysis for a decision
Writing an executive summary
Visual clarity and honesty
Presenting trade offs and uncertainty

Build

A written analysis with a recommendation and supporting visuals

Month 4: Capstone and specialization

Week 13 to 16: Capstone

0 of 2

End to end: question, data, cleaning, analysis or model, evaluation, dashboard, write up
Optional specialization: time series forecasting, recommendation, or NLP analytics

Tools and libraries

prophet or statsmodels for time series
surprise or implicit for recommendation

Build

A portfolio grade project: a clear question, a validated model or analysis, a dashboard, and a written report with a recommendation

Resource master reference

Books

Practical Statistics for Data Scientists by Bruce and Bruce

Hands On Machine Learning by Aurelien Geron

Storytelling with Data by Cole Nussbaumer Knaflic

Repositories

awesome data science curated list

the dbt and Metabase docs

Tools master list

NumPy, pandas, polars, scipy, statsmodels, matplotlib, seaborn, plotly, scikit-learn, xgboost, lightgbm, optuna, shap, umap, PostgreSQL, DuckDB, dbt, Metabase, Superset, prophet

Interview focus

Explain bias and variance and how you diagnose each

How do you design and analyze an A and B test

When do you use precision versus recall

Write a SQL query with a window function for a running metric

How do you prevent data leakage

Walk through a project from question to recommendation