Hotdata CLI
Command line interface for Hotdata.
Homebrew
brew install hotdata-dev/tap/cliBinary (macOS, Linux)
Download a binary from Releases.
Build from source (requires Rust)
cargo build --release
cp target/release/hotdata /usr/local/bin/hotdataRun either of the following (they are equivalent):
hotdata auth login
# or
hotdata authThis launches a browser window where you can authorize the CLI to access your Hotdata account.
Alternatively, authenticate with an API key using the --api-key flag:
hotdata <command> --api-key <api_key>Or set the HOTDATA_API_KEY environment variable (also loaded from .env files):
export HOTDATA_API_KEY=<api_key>
hotdata <command>API key priority (lowest to highest): config file → HOTDATA_API_KEY env var → --api-key flag.
| Command | Subcommands | Description |
|---|---|---|
auth |
login, status, logout |
login or bare auth opens browser login; status / logout manage the saved profile |
workspaces |
list, set |
Manage workspaces |
connections |
list, create, refresh, new |
Manage connections |
tables |
list |
List tables and columns |
datasets |
list, create, update |
Manage uploaded datasets |
context |
list, show, pull, push |
Workspace Markdown context (e.g. data model DATAMODEL) via the context API |
query |
Execute a SQL query | |
queries |
list |
Inspect query run history |
search |
Full-text search across a table column | |
indexes |
list, create, delete |
Manage indexes on a table or dataset |
embedding-providers |
list, get, create, update, delete |
Manage embedding providers used by vector indexes |
results |
list |
Retrieve stored query results |
jobs |
list |
Manage background jobs |
sandbox |
list, new, set, read, update, run |
Manage sandboxes |
skills |
install, status |
Manage the hotdata agent skill |
| Option | Description | Type | Default |
|---|---|---|---|
--api-key |
API key (overrides env var and config) | string | |
-v, --version |
Print version | boolean | |
-h, --help |
Print help | boolean |
hotdata workspaces list [--format table|json|yaml]
hotdata workspaces set [<workspace_id>]listshows all workspaces with a*marker on the active one.setswitches the active workspace. Omit the ID for interactive selection.- The active workspace is used as the default for all commands that accept
--workspace-id.
hotdata connections list [-w <id>] [-o table|json|yaml]
hotdata connections <connection_id> [-w <id>] [-o table|json|yaml]
hotdata connections refresh <connection_id> [-w <id>] [--data] [--schema <name> --table <name>] [--async] [--include-uncached]
hotdata connections new [-w <id>]listreturnsid,name,source_typefor each connection.- Pass a connection ID to view details (id, name, source type, table counts).
refreshtriggers a schema refresh by default. Pass--datato refresh cached row data instead.--schemaand--tablenarrow a data refresh to a single table (must be supplied together).--asyncsubmits a data refresh as a background job and returns a job ID; poll withhotdata jobs <job_id>. Only valid with--data— schema refresh is always synchronous.--include-uncachedincludes tables that haven't been cached yet in a connection-wide data refresh. Only valid with--dataand no--table.newlaunches an interactive connection creation wizard.
# List available connection types
hotdata connections create list [--format table|json|yaml]
# Inspect schema for a connection type
hotdata connections create list <type_name> --format json
# Create a connection
hotdata connections create --name "my-conn" --type postgres --config '{"host":"...","port":5432,...}'hotdata tables list [--workspace-id <id>] [--connection-id <id>] [--schema <pattern>] [--table <pattern>] [--limit <n>] [--cursor <token>] [--format table|json|yaml]- Without
--connection-id: lists all tables withtable,synced,last_sync. - With
--connection-id: includes column details (column,data_type,nullable). --schemaand--tablesupport SQL%wildcard patterns.- Tables are displayed as
<connection>.<schema>.<table>— use this format in SQL queries.
hotdata datasets list [--workspace-id <id>] [--limit <n>] [--offset <n>] [--format table|json|yaml]
hotdata datasets <dataset_id> [--workspace-id <id>] [--format table|json|yaml]
hotdata datasets create --file data.csv [--label "My Dataset"] [--table-name my_dataset]
hotdata datasets create --sql "SELECT ..." --label "My Dataset"
hotdata datasets create --url "https://example.com/data.parquet" --label "My Dataset"
hotdata datasets update <dataset_id> [--label "New Label"] [--table-name new_table]
hotdata datasets refresh <dataset_id> [--workspace-id <id>] [--async]- Datasets are queryable as
datasets.main.<table_name>. --file,--sql,--query-id, and--urlare mutually exclusive.--urlimports data directly from a URL (supports csv, json, parquet).- Format is auto-detected from file extension or content.
- Piped stdin is supported:
cat data.csv | hotdata datasets create --label "My Dataset" refreshre-runs the dataset's source (URL fetch or saved query) and creates a new version. Not supported for upload-source datasets.--asyncsubmits the refresh as a background job and returns a job ID; poll withhotdata jobs <job_id>.
Named Markdown documents for a workspace (data model, glossary, etc.) are stored in the context API. The CLI treats the server as the source of truth; local files are only used where the tool requires a path on disk.
hotdata context list [-w <id>] [--prefix <stem>] [-o table|json|yaml]
hotdata context show <name> [-w <id>]
hotdata context pull <name> [-w <id>] [--force] [--dry-run]
hotdata context push <name> [-w <id>] [--dry-run]showprints Markdown to stdout (no local file needed). Use this to read the workspace data model in scripts or agents.pullwrites./<name>.mdin the current directory from the API. Refuses to overwrite an existing file unless--force.pushreads./<name>.mdand upserts that name in the workspace. Use after editing the file in your project directory.- Names follow SQL identifier rules (ASCII letters, digits, underscore; max 128 characters; SQL reserved words are not allowed). The usual stem for the semantic data model is
DATAMODEL(fileDATAMODEL.mdfor push/pull only).
hotdata query "<sql>" [-w <id>] [--connection <connection_id>] [-o table|json|csv]
hotdata query status <query_run_id> [-o table|json|csv]- Default output is
table, which prints results with row count and execution time. - Use
--connectionto scope the query to a specific connection. - Long-running queries automatically fall back to async execution and return a
query_run_id. - Use
hotdata query status <query_run_id>to poll for results. - Exit codes for
query status:0= succeeded,1= failed,2= still running (poll again).
hotdata queries list [--limit <n>] [--cursor <token>] [--status <csv>] [-o table|json|yaml]
hotdata queries <query_run_id> [-o table|json|yaml]listshows past query executions with status, creation time, duration, row count, and a truncated SQL preview (default limit 20).--statusfilters by run status (comma-separated, e.g.--status running,failed).- View a run by ID to see full metadata (timings,
result_id, snapshot, hashes) and the formatted, syntax-highlighted SQL. - If a run has a
result_id, fetch its rows withhotdata results <result_id>.
--type is required — no default. Pass either vector (similarity search via the index's embedding provider) or bm25 (full-text search). Both run entirely server-side.
# BM25 full-text search (requires a BM25 index on the column)
hotdata search "<query>" --type bm25 --table <connection.schema.table> --column <column> [--select <columns>] [--limit <n>] [-o table|json|csv]
# Vector search (requires a vector index with auto-embedding on the column)
hotdata search "<query>" --type vector --table <table> --column <source_text_column> [--limit <n>]--type vector— pass your query as plain text, name the source text column (e.g.title). The server embeds the query at the same time, using the same provider that auto-embedded the column when the index was built — so distance metric, model, and dimensions all match automatically. NoOPENAI_API_KEY, no client-side embedding, no need to know about the auto-generated_embeddingcolumn. Generated SQL:vector_distance(col, 'query')server-side.--type bm25runsbm25_search(table, col, 'query')— requires a BM25 index on the column.- No vector index, or want to use a different model than the index? Skip
hotdata searchand use raw SQL viahotdata query(e.g.SELECT *, cosine_distance(col, [<your_vec>]) FROM ...). The SQL reference covers the available distance functions and table UDFs. - BM25 results sort by score (descending). Vector results sort by distance (ascending).
--selectspecifies which columns to return (comma-separated, defaults to all).- The previous
--modelflag and stdin-piped-vector path are removed — both hardcodedl2_distanceregardless of the index's actual metric, which silently produced wrong rankings on cosine indexes. For client-side embedding or precomputed-vector workflows, use raw SQL viahotdata query(e.g.SELECT *, cosine_distance(col, [<vec>]) ...).
Indexes attach to either a connection-table (--connection-id + --schema + --table) or a dataset (--dataset-id). The two scopes are mutually exclusive.
# Connection-table scope
hotdata indexes list --connection-id <id> --schema <schema> --table <table> [-o table|json|yaml]
hotdata indexes create --connection-id <id> --schema <schema> --table <table> \
--name <name> --columns <cols> --type sorted|bm25|vector \
[--metric l2|cosine|dot] [--async] \
[--embedding-provider-id <id>] [--dimensions <n>] [--output-column <name>] [--description <text>]
hotdata indexes delete --connection-id <id> --schema <schema> --table <table> --name <name>
# Dataset scope
hotdata indexes list --dataset-id <id> [-o table|json|yaml]
hotdata indexes create --dataset-id <id> --name <name> --columns <cols> --type sorted|bm25|vector ...
hotdata indexes delete --dataset-id <id> --name <name>--typeis required — choosesorted(B-tree-like),bm25(full-text), orvector(similarity).--type vectorrequires exactly one column.--asyncsubmits index creation as a background job and returns a job ID; poll withhotdata jobs <job_id>.- Auto-embedding (text → vector): when
--type vectoris used on a text column, embeddings are generated automatically. The embedding provider can be specified with--embedding-provider-id; if omitted, the first system provider is used. The generated column defaults to{column}_embeddingand can be overridden with--output-column.
hotdata embedding-providers list [-o table|json|yaml]
hotdata embedding-providers get <id> [-o table|json|yaml]
hotdata embedding-providers create --name <name> --provider-type service|local \
[--config '<json>'] [--provider-api-key <key> | --secret-name <name>]
hotdata embedding-providers update <id> [--name <name>] [--config '<json>'] \
[--provider-api-key <key> | --secret-name <name>]
hotdata embedding-providers delete <id>list/getshow registered providers (system providers likesys_emb_openaicome pre-configured).--provider-api-keyauto-creates a managed secret for the provider;--secret-namereferences an existing secret. They are mutually exclusive.--provider-api-keypairs with--provider-typeand avoids colliding with the global--api-key(Hotdata auth).
hotdata results <result_id> [--workspace-id <id>] [--format table|json|csv]
hotdata results list [--workspace-id <id>] [--limit <n>] [--offset <n>] [--format table|json|yaml]- Query results include a
result-idin the table footer — use it to retrieve past results without re-running queries.
hotdata jobs list [--workspace-id <id>] [--job-type <type>] [--status <status>] [--all] [--limit <n>] [--offset <n>] [--format table|json|yaml]
hotdata jobs <job_id> [--workspace-id <id>] [--format table|json|yaml]listshows only active jobs (pendingandrunning) by default. Use--allto see all jobs.--job-typeaccepts:data_refresh_table,data_refresh_connection,dataset_refresh,create_index,create_dataset_index.--statusaccepts:pending,running,succeeded,partially_succeeded,failed.
Sandboxes group related CLI activity (queries, dataset operations, etc.) under a single context.
hotdata sandbox list [-w <id>] [-o table|json|yaml]
hotdata sandbox <sandbox_id> [-w <id>] [-o table|json|yaml]
hotdata sandbox new [--name "My Sandbox"] [-o table|json|yaml]
hotdata sandbox set [<sandbox_id>]
hotdata sandbox read
hotdata sandbox update [<sandbox_id>] [--name "New Name"] [--markdown "..."] [-o table|json|yaml]
hotdata sandbox run <cmd> [args...]
hotdata sandbox <sandbox_id> run <cmd> [args...]listshows all sandboxes with a*marker on the active one.newcreates a sandbox and sets it as active.setswitches the active sandbox. Omit the ID to clear the active sandbox.readprints the markdown content of the current sandbox.updatemodifies the name or markdown of a sandbox (defaults to the active sandbox).runruns a command with the hotdata CLI scoped to a sandbox. Creates a new sandbox unless a sandbox ID is provided beforerun. Useful for launching an agent that can only access sandbox data. Nesting sandboxes is not allowed.
Config is stored at ~/.hotdata/config.yml keyed by profile (default: default).
| Variable | Description | Default |
|---|---|---|
HOTDATA_API_KEY |
API key (overrides config file) | |
HOTDATA_API_URL |
API base URL | https://api.hotdata.dev/v1 |
HOTDATA_APP_URL |
App URL for browser login | https://app.hotdata.dev |
Releases use a two-phase workflow wrapping cargo-release.
Phase 1 — prepare
scripts/release.sh prepare <version>Creates a release/<version> branch, bumps the version, updates CHANGELOG.md, pushes the branch, and opens a pull request.
Phase 2 — finish
scripts/release.sh finishSwitches to main, pulls latest, tags the release, and triggers the dist workflow.