Python for AI-Driven Automation and Business Data Science

7 modules, 26 notebooks (+ 11 optional appendices), end-to-end

A modern, hands-on, self-paced course that takes you from your first line of Python to shipping a real AI-driven automation — Python fluency, business data science, machine learning, AI engineering, and production wiring, all in one curriculum.

🚀 Start here: 00_onboarding/00_master_onboarding.ipynb 🗺️ Then the full map: 00_onboarding/00b_course_overview.ipynb — seven-module diagram, per-notebook time budgets, five learning paths, interactive time estimator. 🏎️ Tight on time? Take the Fast Track — 9 essential notebooks, about 10 hours total. Same teaching, Stretch and Bonus sections trimmed off. 📊 Slide-deck version: slides/00_course_overview.pdf

What's in this repo

Folder	What
`00_onboarding/` … `07_capstones/`	The full course. 26 main notebooks + 11 optional appendices, organised by topic.
`fast_track/`	The fast track. 9 trimmed notebooks (~10 h total) for a quick end-to-end pass at the essentials.
`quizzes/`	Module quizzes. 6 short multiple-choice quizzes (5 questions each, ~10 min) — one per module — to check what stuck.
`data/`	The three sample CSVs the notebooks read (support_ops, api_log, customer_feedback).
`slides/`	A 23-slide course-overview deck (PDF + LaTeX source).
`scripts/`	Local helper scripts — run every notebook end-to-end or check that NB-number references in the docs resolve to real files. Use them whenever you want a sanity-check pass.
`docs/`	Audit reports from the 2026 refinement pass + most recent execution snapshot. Reference material — not part of the course itself.
`llm_providers.py`	Unified interface to OpenAI / Anthropic / Google / Ollama (and an offline `MockLLM`).
`previous_versions/`	The legacy flat 19-notebook layout (pre-2026 refinement), preserved for archive purposes only.

The 7 modules

📍 Module 0 — Onboarding (start here)

Master onboarding notebook + environment check. 20 minutes.

🐍 Module 1 — Foundations (NB 1–6)

Python you can read without friction. Variables, control flow, lists, dicts, functions.

🔌 Module 2 — Real-world I/O (NB 7–8)

HTTP requests, SQL, Pydantic validation. Pull real data from anywhere; refuse bad data at the boundary. (The originally planned NB 9 on Pydantic was folded into NB 8.)

📊 Module 3 — Data Science (NB 10–14)

pandas, NumPy, matplotlib, statistics, time series. The analytical core.

🤖 Module 4 — Machine Learning (NB 15–17)

scikit-learn workflow + honest model evaluation + feature engineering.

🧠 Module 5 — AI Engineering (NB 18–22)

LLM prompts, RAG, agents, document processing, AI evaluation & observability.

🚀 Module 6 — Production (NB 23–24)

Packaging notebooks into projects and scheduling. (Configuration & secrets are covered inline in NB 23; the originally planned standalone NB 25 was folded into it.)

🏆 Module 7 — Capstones (NB 26–27)

Two end-to-end projects — analytical and engineering.

💡 About the numbering gaps. NB 09 and 25 are intentionally absent — Pydantic validation was folded into NB 8, and config & secrets into NB 23. NB 6 (originally a documented gap for "extra Python practice") is now Classes & OOP — added in the 2026 pedagogy pass so every later notebook can lean on real OOP fluency.

Five high-impact notebooks worth singling out

#	Notebook	What it teaches
13	`13_statistics_basics.ipynb`	Confidence intervals, t-tests, Cohen's d, sample-size planning, A/B-test reporting that survives a stakeholder review.
16	`16_model_evaluation.ipynb`	Confusion matrices in cost units, threshold tuning, ROC/PR curves, calibration, learning curves.
17	`17_feature_engineering.ipynb`	Encoding strategies, scaling, datetime features, target leakage, feature selection, custom transformers.
22	`22_ai_evaluation_observability.ipynb`	Golden datasets, LLM-as-judge, tracing, cost dashboards, A/B testing prompts, regression detection.
27	`27_capstone_ai_assistant.ipynb`	An end-to-end AI feature combining everything from Modules 5 + 6.

Optional appendix track — 11 advanced notebooks

A second tier of optional, deep-dive notebooks for readers who want to go beyond the 26-notebook backbone. Each appendix lives next to its parent module and is fully runnable. Unlike the main notebooks, appendices are written as reference notebooks: they ship with pre-rendered outputs only when noted in their first cell, focus on demonstrating libraries rather than interactive exercises, and skip the Solution/Debug-me scaffolding.

Module	Appendix	What it covers
03 Data Science	`A1_forecasting_classical.ipynb`	ARIMA / SARIMA / ETS deep dive
03 Data Science	`A2_forecasting_prophet_libraries.ipynb`	Prophet, NeuralProphet, sktime, Darts
03 Data Science	`A3_forecasting_deep_learning.ipynb`	LSTM + Transformer forecasters in PyTorch
03 Data Science	`A4_forecasting_foundation_models.ipynb`	TimesFM, Chronos, TabPFN-TS
04 ML	`A1_pytorch_foundations.ipynb`	Tensors, autograd, MLPs
04 ML	`A2_pytorch_vision_and_sequences.ipynb`	CNNs, RNNs, Transformers
04 ML	`A3_pytorch_fine_tuning.ipynb`	Transfer learning + LoRA
04 ML	`A4_tabpfn_priorlab.ipynb`	TabPFN tabular foundation model + cloud API
05 AI Eng	`A1_llm_providers_guide.ipynb`	OpenAI / Anthropic / Google / Ollama
05 AI Eng	`A2_vector_stores_survey.ipynb`	FAISS, Chroma, Qdrant, Weaviate, pgvector
05 AI Eng	`A3_rag_and_agent_frameworks.ipynb`	LangChain, LlamaIndex, Haystack, agents

Learning paths

Match yourself to the path that fits:

You are	You'll touch	Time
Complete beginner	All 7 modules in order	~35 h
Analyst (knows Excel/SQL)	Modules 0, 2, 3, 4, 7	~20 h
Developer (knows another language)	Modules 0, 2, 3, 4, 5, 6, 7	~28 h
ML practitioner	Modules 0, 5, 6, 7	~15 h
Manager (curious)	Module 0 + 7 only	~8 h

The course-overview deck has these paths visualised — open it before you pick.

How each notebook is structured

Every notebook follows the same six-section template:

🎯 Learning objectives + ✅ Prerequisites
Numbered concept sections — short prose, then runnable code.
🧪 Practice exercises (numbered 1, 2, 3, …) — 3–5 per notebook, with full solutions and reasoning (not just the answer). One per notebook is a 🐞 Debug-me puzzle.
🧠 Stretch exercises (lettered A, B, C, D) — 4 per notebook, deliberately harder. The kind of question you'd want to be able to answer in an interview. Same Solution + Reasoning format as the practice exercises.
🎁 Bonus mini-project — one larger applied task.
✅ Self-assessment checklist + 🚀 Next step — pointer to the next notebook in your path.

That's ~8 exercises per notebook on average — and 180+ across the course, every single one with a worked solution and an explanation of why it works.

The six visual markers (💡 tip, 🎯 intuition, ⚠️ pitfall, 🧪 exercise, 🎁 bonus, 🐞 debug-me) are road signs you'll see throughout. They are explained in the onboarding notebook.

How to study a notebook — the five-step loop

   Read  →  Run  →  Try  →  Tweak  →  Predict
     │              │
     │              └── try every exercise before clicking the solution
     └────────── read the prose before looking at the code

Apply this to every notebook. Five minutes of genuine struggle beats five hours of passive reading.

Setup

Google Colab (easiest)

Upload any notebook. Done. All required libraries are pre-installed.

Local Jupyter

python -m venv .venv
source .venv/bin/activate         # macOS / Linux
.venv\Scripts\activate            # Windows
pip install -r requirements.txt
jupyter lab

Tested with Python 3.10+. Module 0 includes an environment-check cell.

What you'll build by the end

Six smaller artefacts along the way (KPI snapshot, ETL pipeline, SQL report, forecast, inbox triage, scheduled job) plus two big capstones:

🏆 Capstone A — AI Support-Bot Analytics (NB 26): 5 channels × 12 months → 2×2 dashboard → Simpson's-paradox demo → executive summary.
🏆 Capstone B — AI Customer-Feedback Assistant (NB 27): classification + validation + RAG + scheduled orchestration + cost dashboard + eval gate.

You can talk through either of these as "a project I built" in an interview.

Course philosophy

A few principles that guided every notebook:

Explain why, not just how. Code without intuition is fragile.
Show real examples. Tip calculators teach syntax; KPI parsers teach the job.
Practice over passive reading. Every concept gets exercises with reasoning.
Modern tools, modern habits. Type hints, virtual envs, validation, pytest, observability.
AI as a tool, not magic. LLMs are function calls; calibration matters.

What's not in the course

So you're not surprised later:

❌ Deep learning from scratch (PyTorch / TF training loops). You'll use pre-trained models — which is what most working AI applications need.
❌ Vendor-specific cloud deployment (AWS / GCP / Azure). NB 24 teaches the patterns of scheduling — without committing to one platform.
❌ Vector-database deep dive. NB 19 implements the underlying retrieval logic and points you at Qdrant / Weaviate / Pinecone for the production scale-up.

These are conscious trade-offs.

LLM providers — local and hosted, four options

Notebooks 18 – 22 and 27 can be run entirely offline with the built-in MockLLM. When you're ready for real intelligence, swap one line. The course supports four providers through a unified interface in llm_providers.py:

Provider	Class	When
🟢 OpenAI	`OpenAILLM(model="gpt-4o-mini")`	Reliable default.
🟠 Anthropic	`AnthropicLLM(model="claude-haiku-4-5-20251001")`	Long context, careful tone.
🔵 Google	`GoogleLLM(model="gemini-2.0-flash")`	Cheap at scale.
🟣 Ollama	`OllamaLLM(model="llama3.2:3b")`	Local — no internet, no key, no cost.

# For hosted providers, set the corresponding env var (never inline):
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
export GOOGLE_API_KEY=...
# Ollama: `ollama pull llama3.2:3b` once, then `ollama serve` (auto-starts on macOS).

📓 See 05_ai_engineering/A1_llm_providers_guide.ipynb for setup, model recommendations, cost estimates, and a decision table.

⚠️ Never commit API keys to git. The notebooks are designed so you don't have to touch a key inside the notebook itself.

Layout at a glance

.
├── README.md                       ← you are here
├── requirements.txt
├── LICENSE
├── llm_providers.py
│
├── 00_onboarding/
│   ├── README.md
│   ├── 00_master_onboarding.ipynb
│   └── 00b_course_overview.ipynb
│
├── 01_foundations/         ← NB 1–6: Python basics, control, lists, dicts, functions, classes & OOP
├── 02_real_world_io/       ← NB 7–8: HTTP, SQL  (NB 9 was folded into NB 8)
├── 03_data_science/        ← NB 10–14: pandas, NumPy, plots, stats, time series  (+ A1–A4 forecasting appendices)
├── 04_machine_learning/    ← NB 15–17: sklearn, evaluation, feature engineering  (+ A1–A4 PyTorch / TabPFN appendices)
├── 05_ai_engineering/      ← NB 18–22: prompts, RAG, agents, docs, AI evaluation  (+ A1–A3 provider / vector-store / framework appendices)
├── 06_production/          ← NB 23–24: packaging, scheduling  (NB 25 folded into NB 23)
├── 07_capstones/           ← NB 26: analytics  +  NB 27: AI assistant
│
├── fast_track/                     ← 9 trimmed notebooks (~10 h) — the shortcut path
│
├── quizzes/                        ← 6 multiple-choice quizzes, one per module
│
├── slides/
│   ├── 00_course_overview.pdf     ← 23-slide onboarding deck
│   └── images/                     ← 7 overview figures
│
├── data/                           ← 3 sample CSVs (support_ops, api_log, customer_feedback)
│
├── scripts/
│   ├── check_nb_references.py     ← link checker for NB-number references in docs
│   └── run_all_notebooks.py       ← execute every notebook end-to-end (for local sanity checks)
│
├── docs/                          ← audit reports + most recent execution snapshot
│
└── previous_versions/
    └── flat_19_notebook_layout/   ← the pre-2026 flat layout, kept for archive only

About the previous_versions/ folder

Before the 2026 refinement pass, the course shipped as a flat list of 19 notebooks at the top level (01_python_basics.ipynb … 19_scheduling_orchestration.ipynb) alongside their own data/, slides/, and requirements.txt. That layout is preserved verbatim in previous_versions/flat_19_notebook_layout/ so old bookmarks keep working, but the canonical course is the 7-module structure at the top level — start there.

Contributing & feedback

The course gets better when real readers tell us what didn't land. If you spot a bug, an unclear explanation, or a missing example, please open an issue or pull request.

Licence

MIT — see LICENSE. Use freely for personal learning, teaching, or any other purpose.

Happy coding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python for AI-Driven Automation and Business Data Science

7 modules, 26 notebooks (+ 11 optional appendices), end-to-end

What's in this repo

The 7 modules

📍 Module 0 — Onboarding (start here)

🐍 Module 1 — Foundations (NB 1–6)

🔌 Module 2 — Real-world I/O (NB 7–8)

📊 Module 3 — Data Science (NB 10–14)

🤖 Module 4 — Machine Learning (NB 15–17)

🧠 Module 5 — AI Engineering (NB 18–22)

🚀 Module 6 — Production (NB 23–24)

🏆 Module 7 — Capstones (NB 26–27)

Five high-impact notebooks worth singling out

Optional appendix track — 11 advanced notebooks

Learning paths

How each notebook is structured

How to study a notebook — the five-step loop

Setup

Google Colab (easiest)

Local Jupyter

What you'll build by the end

Course philosophy

What's not in the course

LLM providers — local and hosted, four options

Layout at a glance

About the previous_versions/ folder

Contributing & feedback

Licence

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
00_onboarding		00_onboarding
01_foundations		01_foundations
02_real_world_io		02_real_world_io
03_data_science		03_data_science
04_machine_learning		04_machine_learning
05_ai_engineering		05_ai_engineering
06_production		06_production
07_capstones		07_capstones
data		data
docs		docs
fast_track		fast_track
previous_versions		previous_versions
quizzes		quizzes
scripts		scripts
slides		slides
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
llm_providers.py		llm_providers.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Python for AI-Driven Automation and Business Data Science

7 modules, 26 notebooks (+ 11 optional appendices), end-to-end

What's in this repo

The 7 modules

📍 Module 0 — Onboarding (start here)

🐍 Module 1 — Foundations (NB 1–6)

🔌 Module 2 — Real-world I/O (NB 7–8)

📊 Module 3 — Data Science (NB 10–14)

🤖 Module 4 — Machine Learning (NB 15–17)

🧠 Module 5 — AI Engineering (NB 18–22)

🚀 Module 6 — Production (NB 23–24)

🏆 Module 7 — Capstones (NB 26–27)

Five high-impact notebooks worth singling out

Optional appendix track — 11 advanced notebooks

Learning paths

How each notebook is structured

How to study a notebook — the five-step loop

Setup

Google Colab (easiest)

Local Jupyter

What you'll build by the end

Course philosophy

What's not in the course

LLM providers — local and hosted, four options

Layout at a glance

About the previous_versions/ folder

Contributing & feedback

Licence

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages