Professional Python project: relational data and SQL analytics.
Data analytics requires a variety of skills. This course builds capabilities through working projects.
In the age of generative AI, durable skills are grounded in real work: setting up a professional environment, reading and running code, understanding the logic, and pushing work to a shared repository. Each project follows the structure of professional Python projects. We learn by doing.
This project introduces relational data and SQL used when storing structured data in tables. Analysts are typically highly skilled at both SQL and Python.
Sample datasets are provided in the data/raw folder
across several topic domains:
- retail - a store records many sales (the worked example)
- library - a library branch manages many checkouts
- shelter - a shelter manages many animal adoptions
- civic_event - an event manages many attendees
Each domain has two related tables in a 1-to-many relationship. You will run the retail example, then implement the same pipeline for a domain you choose.
You'll work with just these areas:
- data/raw/* - raw CSV input files
- data/processed/ - processed data outputs, if created
- artifacts/ - generated database files, logs, or reports
- docs/ - the project narrative and documentation
- sql/ - SQL query files
- src/datafun/ - Python orchestration scripts
- pyproject.toml - update project metadata
- zensical.toml - update documentation site metadata
Follow the step-by-step workflow guide to complete:
- Phase 1. Start & Run
- Phase 2. Change Authorship
- Phase 3. Read & Understand
- Phase 4. Modify
- Phase 5. Apply
Challenges are expected. Sometimes instructions may not quite match your operating system. When issues occur, share screenshots, error messages, and details about what you tried. Working through issues is part of implementing professional projects.
After completing Phase 1. Start & Run, you'll have your own GitHub project, running on your machine.
Running the examples should create generated database files in artifacts/.
A new file project.log will appear in the root project folder
and you should see:
========================
Executed successfully!
========================A new file project.log will appear in the root project folder.
The commands below are used in the workflow guide above. They are provided here for convenience.
Follow the guide for the full instructions.
Show command reference
After you get a copy of this repo in your own GitHub account,
open a machine terminal in your Repos folder:
# Replace username with YOUR GitHub username.
git clone https://github.com/username/datafun-05-sql
cd datafun-05-sql
code .These are listed for convenience. For best results, follow the detailed instructions in pro-analytics-02 guide.
uv self update
uv python pin 3.14
uv lock --upgrade
uv sync --extra dev --extra docs --upgrade
uvx pre-commit install
git add -A
uvx pre-commit run --all-files
# repeat if changes were made
uvx pre-commit run --all-files
# run the example pipelines (duckdb and sqlite)
uv run python -m datafun.app_retail_duckdb_case
uv run python -m datafun.app_retail_sqlite_case
# do chores
uv run ruff format .
uv run ruff check . --fix
uv run python -m pyright
uv run python -m pytest
uv run python -m zensical build
# save progress
git add -A
git commit -m "update"
git push -u origin main- Use the UP ARROW and DOWN ARROW in the terminal to scroll through past commands.
- Use
CTRL+fto find (and replace) text within a file. - You do not need to add to or modify
tests/. They are provided for example only. - Many files are silent helpers. Explore as you like, but nothing is required.
- You do NOT not to understand everything; understanding builds naturally over time.
If you see something like this in your terminal: >>> or ...
You accidentally started Python interactive mode.
It happens.
Press Ctrl+c (both keys together) or Ctrl+Z then Enter on Windows.
| INFO | P05 | RUN SQL query: C:\Repos\datafun\datafun-05-sql\sql\duckdb\case_retail_query_sales_by_category.sql
| INFO | P05 | ====================================
| INFO | P05 | case_retail_query_sales_by_category.sql
| INFO | P05 | ====================================
| INFO | P05 | product_category, sale_count, total_revenue, avg_sale_amount
| INFO | P05 | Outdoors, 9, 3200.0, 355.56
| INFO | P05 | Clothing, 10, 1375.0, 137.5
| INFO | P05 | Food, 11, 652.0, 59.27
| INFO | P05 |
| INFO | P05 | RUN SQL query: C:\Repos\datafun\datafun-05-sql\sql\duckdb\case_retail_query_kpi_revenue.sql
| INFO | P05 | ====================================
| INFO | P05 | case_retail_query_kpi_revenue.sql
| INFO | P05 | ====================================
| INFO | P05 | store_id, store_name, city, region, sale_count, total_revenue, avg_sale_amount
| INFO | P05 | S003, Central Plaza, Mankato, South, 10, 1868.0, 186.8
| INFO | P05 | S001, North Market, Duluth, North, 10, 1834.0, 183.4
| INFO | P05 | S002, Lakeside Shop, Ely, North, 10, 1525.0, 152.5
| INFO | P05 | ========================
| INFO | P05 | Executed successfully!
| INFO | P05 | ========================
| INFO | P05 | END main()