Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 11 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ API key priority (lowest to highest): config file → `HOTDATA_API_KEY` env var
| `connections` | `list`, `create`, `refresh`, `new` | Manage connections |
| `databases` | `list`, `create`, `delete`, `tables` | Managed databases (create and load tables via parquet) |
| `tables` | `list` | List tables and columns |
| `datasets` | `list`, `create`, `update` | Manage uploaded datasets |
| `views` | `list`, `create`, `update`, `refresh` | Manage SQL-derived views |
| `context` | `list`, `show`, `pull`, `push` | Workspace Markdown context (e.g. data model `DATAMODEL`) via the context API |
| `query` | | Execute a SQL query |
| `queries` | `list` | Inspect query run history |
Expand Down Expand Up @@ -146,7 +146,7 @@ hotdata databases tables delete <database> <table> [--schema public]

- `create` registers a managed connection (`source_type: managed`) with no external credentials. Use `--table` to declare tables up front (required before `tables load` on the current API).
- `tables load` uploads a **parquet** file (or uses a staged `upload_id` from `POST /v1/files`) and publishes it as the table generation (`replace` mode).
- For CSV/JSON uploads without a managed database, use `hotdata datasets create` instead (`datasets.main.*`).
- For SQL-query materializations without a managed database, use `hotdata views create` instead (`views.main.*`).

Example:

Expand All @@ -167,24 +167,19 @@ hotdata tables list [--workspace-id <id>] [--connection-id <id>] [--schema <patt
- `--schema` and `--table` support SQL `%` wildcard patterns.
- Tables are displayed as `<connection>.<schema>.<table>` — use this format in SQL queries.

## Datasets
## Views

```sh
hotdata datasets list [--workspace-id <id>] [--limit <n>] [--offset <n>] [--format table|json|yaml]
hotdata datasets <dataset_id> [--workspace-id <id>] [--format table|json|yaml]
hotdata datasets create --file data.csv [--label "My Dataset"] [--table-name my_dataset]
hotdata datasets create --sql "SELECT ..." --label "My Dataset"
hotdata datasets create --url "https://example.com/data.parquet" --label "My Dataset"
hotdata datasets update <dataset_id> [--label "New Label"] [--table-name new_table]
hotdata datasets refresh <dataset_id> [--workspace-id <id>] [--async]
hotdata views list [--workspace-id <id>] [--limit <n>] [--offset <n>] [--output table|json|yaml]
hotdata views <view_id> [--workspace-id <id>] [--output table|json|yaml]
hotdata views create --name my_view [--description "My View"] (--sql "SELECT ..." | --query-id <id>)
hotdata views update <view_id> [--description "New Label"] [--name new_table]
hotdata views refresh <view_id> [--workspace-id <id>] [--async]
```

- Datasets are queryable as `datasets.main.<table_name>`.
- `--file`, `--sql`, `--query-id`, and `--url` are mutually exclusive.
- `--url` imports data directly from a URL (supports csv, json, parquet).
- Format is auto-detected from file extension or content.
- Piped stdin is supported: `cat data.csv | hotdata datasets create --label "My Dataset"`
- `refresh` re-runs the dataset's source (URL fetch or saved query) and creates a new version. Not supported for upload-source datasets.
- Views are queryable as `views.main.<name>`.
- `--sql` and `--query-id` are mutually exclusive; exactly one is required for `create`.
- `refresh` re-runs the view's source query and creates a new version.
- `--async` submits the refresh as a background job and returns a job ID; poll with `hotdata jobs <job_id>`.

## Workspace context
Expand Down
16 changes: 8 additions & 8 deletions skills/hotdata-analytics/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
---
name: hotdata-analytics
description: Use this skill when the user wants OLAP-style SQL analytics in Hotdata — aggregations, GROUP BY, JOINs, reporting, exploratory queries, query run history, stored results, or materialized follow-up tables (Chain via datasets or managed databases). Activate for "analyze", "aggregate", "rollup", "pivot", "report", "metrics", "GROUP BY", "query history", "past queries", "query runs", "stored results", "materialize", "chain", "intermediate table", or sorted indexes for filters/range scans. Do not load for BM25/vector search or geospatial SQL — use hotdata-search or hotdata-geospatial. Requires the core hotdata skill for connections, tables, datasets, and auth.
description: Use this skill when the user wants OLAP-style SQL analytics in Hotdata — aggregations, GROUP BY, JOINs, reporting, exploratory queries, query run history, stored results, or materialized follow-up tables (Chain via views or managed databases). Activate for "analyze", "aggregate", "rollup", "pivot", "report", "metrics", "GROUP BY", "query history", "past queries", "query runs", "stored results", "materialize", "chain", "intermediate table", or sorted indexes for filters/range scans. Do not load for BM25/vector search or geospatial SQL — use hotdata-search or hotdata-geospatial. Requires the core hotdata skill for connections, tables, views, and auth.
version: 0.3.2
---

# Hotdata Analytics Skill

**OLAP-style analytics** in Hotdata: PostgreSQL-dialect SQL, query execution, run history, stored results, **Chain** materializations, and **sorted** indexes for filters and joins.

**Prerequisites:** Authenticate, workspace, and catalog discovery via the **`hotdata`** skill (`connections`, `tables`, `datasets`, `databases`).
**Prerequisites:** Authenticate, workspace, and catalog discovery via the **`hotdata`** skill (`connections`, `tables`, `views`, `databases`).

**Related skills:** **`hotdata-search`** (BM25, vector, retrieval indexes), **`hotdata-geospatial`** (spatial SQL).

Expand All @@ -23,7 +23,7 @@ hotdata query status <query_run_id> [--output table|json|csv]

- **PostgreSQL dialect.** Quote mixed-case identifiers: `"CustomerName"`.
- Use **`hotdata tables list`** for schema discovery — not `information_schema` via `query`.
- Fully qualified names: `<connection>.<schema>.<table>`, `datasets.<schema>.<table>`, `<database>.<schema>.<table>`.
- Fully qualified names: `<connection>.<schema>.<table>`, `views.<schema>.<table>`, `<database>.<schema>.<table>`.
- Long-running queries may return `query_run_id` → poll with **`query status`** (exit `2` = still running). Do not re-run identical heavy SQL while polling.
- For **workspace-wide** joins and naming, load **context:DATAMODEL** when listed (`hotdata context list` → `show DATAMODEL`) — see **`hotdata`** skill.

Expand Down Expand Up @@ -82,8 +82,8 @@ hotdata results <result_id> [--workspace-id <workspace_id>] [--output table|json
2. **Materialize** (pick one)

```bash
hotdata datasets create --name chain_slice [--description "chain slice"] --sql "SELECT ..."
hotdata datasets create --name chain_from_saved [--description "from saved"] --query-id <query_id>
hotdata views create --name chain_slice --description "chain slice" --sql "SELECT ..."
hotdata views create --name chain_from_saved --description "from saved" --query-id <query_id>
```

Or managed parquet:
Expand All @@ -94,10 +94,10 @@ hotdata results <result_id> [--workspace-id <workspace_id>] [--output table|json
hotdata databases tables load slice --file ./slice.parquet
```

3. **Chain query** — use printed **`full_name`** or `datasets list` **FULL NAME** column:
3. **Chain query** — use printed **`full_name`** or `views list` **FULL NAME** column:

```bash
hotdata query "SELECT * FROM datasets.main.chain_slice WHERE ..."
hotdata query "SELECT * FROM views.main.chain_slice WHERE ..."
hotdata query "SELECT * FROM analytics.public.slice WHERE ..."
```

Expand All @@ -122,4 +122,4 @@ List and delete use the same `hotdata indexes` commands as in the search skill;

## Sandboxes and chains

Sandbox datasets use **`datasets.<sandbox_id>.<table>`**, not `datasets.main`. Run queries with active sandbox config or `hotdata sandbox <id> run hotdata query "..."`. See **`hotdata`** skill **Sandboxes**.
Sandbox views use **`views.<sandbox_id>.<table>`**, not `views.main`. Run queries with active sandbox config or `hotdata sandbox <id> run hotdata query "..."`. See **`hotdata`** skill **Sandboxes**.
24 changes: 12 additions & 12 deletions skills/hotdata-analytics/references/WORKFLOWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

OLAP-style SQL, **History** (query runs and stored results), and **Chain** (materialized follow-ups). Requires **`hotdata`** for auth, workspaces, and catalog commands.

**Related:** **`hotdata-search`** for BM25/vector indexes and `hotdata search`; **`hotdata`** [WORKFLOWS.md](../../hotdata/references/WORKFLOWS.md) for datasets vs managed databases.
**Related:** **`hotdata-search`** for BM25/vector indexes and `hotdata search`; **`hotdata`** [WORKFLOWS.md](../../hotdata/references/WORKFLOWS.md) for views vs managed databases.

---

Expand Down Expand Up @@ -66,11 +66,11 @@ hotdata query "SELECT ..."

Land a smaller table — pick one:

**Datasets** (CSV/JSON/URL/SQL snapshot → `datasets.<schema>.<table>`):
**Views** (SQL snapshot → `views.<schema>.<table>`):

```bash
hotdata datasets create --label "chain revenue slice" --sql "SELECT ..." [--table-name chain_revenue_slice]
hotdata datasets create --label "from saved" --query-id <query_id> [--table-name ...]
hotdata views create --name chain_revenue_slice --description "chain revenue slice" --sql "SELECT ..."
hotdata views create --name chain_from_saved --description "from saved" --query-id <query_id>
```

**Managed database** (parquet → `<database>.<schema>.<table>`):
Expand All @@ -80,17 +80,17 @@ hotdata databases create --name chain_db --table revenue_slice
hotdata databases tables load chain_db revenue_slice --file ./revenue_slice.parquet
```

Note the printed **`full_name`** (e.g. `datasets.main.chain_revenue_slice` or `chain_db.public.revenue_slice`). For datasets, **`FULL NAME`** from `datasets list` is authoritative.
Note the printed **`full_name`** (e.g. `views.main.chain_revenue_slice` or `chain_db.public.revenue_slice`). For views, **`FULL NAME`** from `views list` is authoritative.

### 3. Chain query

Query using that name — do not hardcode `datasets.main` if the schema segment is a sandbox id:
Query using that name — do not hardcode `views.main` if the schema segment is a sandbox id:

```bash
hotdata datasets list
hotdata query "SELECT * FROM datasets.main.chain_revenue_slice WHERE ..."
hotdata views list
hotdata query "SELECT * FROM views.main.chain_revenue_slice WHERE ..."
# Sandbox example (use actual full_name from create or list):
# hotdata query "SELECT * FROM datasets.s_ufmblmvq.chain_revenue_slice WHERE ..."
# hotdata query "SELECT * FROM views.s_ufmblmvq.chain_revenue_slice WHERE ..."
# Managed database:
# hotdata query "SELECT * FROM chain_db.public.revenue_slice WHERE ..."
```
Expand All @@ -99,18 +99,18 @@ hotdata query "SELECT * FROM datasets.main.chain_revenue_slice WHERE ..."

For **sandbox-scoped** chain tables:

- Qualified name is **`datasets.<sandbox_id>.<table>`**, not `datasets.main`.
- Qualified name is **`views.<sandbox_id>.<table>`**, not `views.main`.
- Run queries with **active sandbox** in config (`hotdata sandbox set`) **or** inside **`hotdata sandbox <sandbox_id> run hotdata query "…"`**.
- Without sandbox context, you may get **access denied** on sandbox-only tables.

### Naming and documentation

- Prefer predictable `--table-name` values: `chain_<topic>_<YYYYMMDD>`.
- Record long-lived chains in **context:DATAMODEL → Derived tables (Chain)** with the **full** SQL name you use (`datasets.…` or `database.schema.table`).
- Record long-lived chains in **context:DATAMODEL → Derived tables (Chain)** with the **full** SQL name you use (`views.…` or `database.schema.table`).
- Promote join/grain findings to **context:DATAMODEL** when they should outlive the sandbox (**`hotdata`** skill).

### Guardrails

- Materialize when the base scan is large and the follow-up runs many times.
- Keep Chain tables focused; avoid wide `SELECT *` materializations when a narrow projection suffices.
- For upload format choice (datasets vs databases), see **`hotdata`** WORKFLOWS — [Datasets vs managed databases](../../hotdata/references/WORKFLOWS.md#datasets-vs-managed-databases).
- For source format choice (views vs databases), see **`hotdata`** WORKFLOWS — [Views vs managed databases](../../hotdata/references/WORKFLOWS.md#views-vs-managed-databases).
Loading
Loading