ClickHouse observability in under 8 minutes. No agents.
A self-hosted monitoring dashboard that tells you what to fix, not just what's broken. Connect any ClickHouse cluster with a read-only user and get a 0β100 health score, smart alerts with SQL fix commands, query performance analysis, and multi-cluster support β all from a single Docker container.
Getting Started Β· Features Β· Documentation Β· API Reference
Cluster overview with health scores, alert counts, and collector status across all monitored clusters.
10-panel health dashboard with cluster health score, replication, merge performance, disk usage, and more.
Smart alerts include root cause context and a copy-paste SQL fix command.
Slow queries with P50/P95/P99 percentiles, failed query breakdown, and parts distribution.
- π― 10-panel health dashboard β cluster health score (0β100 with letter grade), replication status, merge performance, disk usage, mutations, broken parts, keeper nodes, compression analysis, and error trending
- π Smart alerting β 10 built-in alert rules with Slack delivery, each including root cause context and a SQL fix command (e.g.,
SYSTEM RESTART REPLICA,KILL MUTATION) - βοΈ Custom alert rules β create your own rules with configurable thresholds, severity, cooldown, and Slack channel overrides
- π Query inspector β top queries by CPU/memory/duration with EXPLAIN on click, failed query breakdown by exception code, and parts size distribution
- π Multi-cluster support β monitor multiple ClickHouse clusters from a single instance with a cluster overview page and one-click switching
- π« Zero-agent collection β connects via the ClickHouse HTTP API (
:8123), no software installed on monitored nodes - π§ Onboarding wizard β guided setup to connect a cluster, discover topology, and configure Keeper nodes
- β¨οΈ Command palette β Cmd+K / Ctrl+K for quick navigation across clusters and pages
- π Password protection β optional password gate for safe exposure beyond localhost
- π Encrypted credentials β cluster passwords and Slack webhooks stored with Fernet encryption
- ποΈ Data retention β configurable automatic cleanup of metric snapshots, health scores, and alert history
- π Operator Runbook β Installation, configuration, upgrade, backup, and troubleshooting for self-hosted deployments
- π User Guide β Dashboard panels, health scoring, alerts, query inspector, and command palette explained
- π API Reference β Full REST API documentation with authentication, endpoint details, and curl examples
| Dashboard Panel | ClickHouse System Tables | What It Shows |
|---|---|---|
| π― Health Score | All panels combined | 0β100 score with letter grade (AβF) |
| π Replication Status | system.replicas, system.replication_queue |
Per-table delay, queue size, read-only state |
| π Merge Performance | system.merges, system.parts |
Active merges, throughput, backlog estimate |
| πΎ Disk Usage | system.disks |
Per-disk free space and usage percentage |
| βοΈ Mutations | system.mutations |
Active mutations, stuck detection (>1hr) |
| π΄ Broken Parts | system.detached_parts |
Detached and broken part counts |
| π Keeper Nodes | TCP health check | Per-node keeper connectivity status |
| π¦ Compression | system.parts_columns |
Per-table compression ratios, flags ratio <2.0 |
| π Error Trending | system.errors |
Time-series error rates with severity classification |
| π Query Performance | system.query_log |
Top queries by duration, P50/P95/P99 percentiles |
curl -O https://raw.githubusercontent.com/Clustersight-io/Clustersight/master/docker-compose.yml
docker compose up -dOpen http://localhost:3001 and follow the onboarding wizard to connect your ClickHouse cluster. Time to first dashboard: ~5 minutes.
Note: The default port is 3001. To change it, edit the
portsline indocker-compose.yml(e.g.,"8080:3000"). The image is pulled from Docker Hub.
git clone https://github.com/Clustersight-io/Clustersight.git
cd Clustersight
# Edit docker-compose.yml: uncomment "build: ." and comment out "image: ..."
docker compose up -dPrerequisites: Python 3.11+, Node.js 20+
# Backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
.venv/bin/uvicorn clustersight.main:app --port 8000 --reload
# Frontend (separate terminal)
cd frontend
npm install
npm run dev # Vite dev server on :5173 with HMRThe React dev server on :5173 is your entry point for local development β it proxies API calls to the backend on :8000 automatically.
Copy .env.example to .env and edit as needed. All variables are optional β sensible defaults are used.
| Variable | Default | Description |
|---|---|---|
CLUSTERSIGHT_PASSWORD |
(unset) | Set to enable password protection. When set, all API requests require a Bearer token (SHA-256 of this value). Leave blank for local development (no auth). |
FERNET_KEY |
(auto-generated) | Fernet encryption key for cluster passwords and Slack webhook URLs. Auto-generated to .key file on first run. Set explicitly if you need a specific key. |
DATABASE_PATH |
/app/data/clustersight.db |
SQLite database file location |
LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
COLLECTION_INTERVAL |
30 |
Seconds between collection cycles |
CACHE_TTL |
30 |
API response cache TTL in seconds |
CORS_ORIGINS |
http://localhost:5173,http://localhost:3000 |
Allowed CORS origins (comma-separated) |
SLACK_WEBHOOK_URL |
(unset) | Slack incoming webhook URL for alert delivery. Also configurable via the Settings page in the UI. |
APP_URL |
http://localhost:3000 |
Base URL used in Slack notification links. For Docker, set to your external host URL (e.g., http://localhost:3001 with default port mapping). |
Cluster credentials are entered via the onboarding wizard and stored encrypted in SQLite.
When CLUSTERSIGHT_PASSWORD is set:
- The app shows a full-screen password prompt on first load
- All API requests require
Authorization: Bearer <sha256-of-password> - The token is stored in
sessionStorage(cleared when the tab closes) - If the password changes (container restart with new env var), the old token stops working and the prompt re-appears automatically
When CLUSTERSIGHT_PASSWORD is not set:
- The app runs in dev mode with no authentication
- A warning is logged on startup
# docker-compose.yml
environment:
- CLUSTERSIGHT_PASSWORD=your_secure_passwordClusterSight requires a read-only user with SELECT on system tables. Run this on your cluster:
CREATE USER IF NOT EXISTS clustersight_ro
IDENTIFIED BY 'your_strong_password'
DEFAULT DATABASE system;
GRANT SELECT ON system.* TO clustersight_ro;
GRANT SELECT ON information_schema.* TO clustersight_ro;
-- Optional: resource limits to cap collector overhead
ALTER USER clustersight_ro
SETTINGS max_execution_time = 10, max_memory_usage = 100000000;For replicated clusters, add ON CLUSTER 'your_cluster_name' to each statement.
All application data is stored in /app/data/ inside the container, mapped to ./data on the host via the Docker volume mount.
What is persisted:
- Cluster configurations (encrypted passwords)
- Alert history and rule customizations
- Health scores and collection snapshots
- Fernet encryption key (
.keyfile) - SQLite database (
clustersight.db)
Backup:
cp data/clustersight.db data/clustersight.db.bak
cp data/.key data/.key.bak # encryption key β required to read encrypted passwords
β οΈ The./datadirectory must be volume-mounted. Without it, all data is lost on container restart.
docker compose pull
docker compose up -d- Migrations run automatically β Alembic database migrations execute on startup. No manual steps needed.
- Preserve your data volume β Ensure
./data:/app/dataremains mounted. This directory contains your SQLite database and Fernet encryption key. - FERNET_KEY β If you set
FERNET_KEYexplicitly (instead of using the auto-generated.keyfile), ensure the same key is set after upgrade. Losing the key means encrypted passwords and webhook URLs become unreadable.
1. Can't connect to ClickHouse
Symptom: "Connection failed" during onboarding.
Fix: Verify the ClickHouse HTTP API is reachable:
curl http://your-clickhouse-host:8123/ping
# Should return "Ok."Check firewall rules, ensure port 8123 is open, and verify the host is accessible from the machine running ClusterSight.
2. system.* permission denied
Symptom: Dashboard panels show errors or empty data.
Fix: The read-only user is missing GRANT permissions. Run:
GRANT SELECT ON system.* TO clustersight_ro;
GRANT SELECT ON information_schema.* TO clustersight_ro;3. Fernet key lost after restart
Symptom: "DecryptionError" in logs, cluster connections fail.
Fix: The .key file was not persisted. Ensure the data volume is mounted:
volumes:
- ./data:/app/data # Persists database + encryption keyIf the key is already lost, delete the affected clusters and re-add them with fresh credentials.
4. Port conflict
Symptom: Container fails to start or port already in use.
Fix: Remap the port in docker-compose.yml:
ports:
- "8080:3000" # Change 8080 to any available port5. Auth prompt won't accept password
Symptom: Password prompt keeps re-appearing after entering the correct password.
Fix: The CLUSTERSIGHT_PASSWORD env var must contain the plaintext password (not a hash). ClusterSight hashes it internally with SHA-256. Check:
docker compose exec clustersight printenv CLUSTERSIGHT_PASSWORDIf using .env file, ensure there are no trailing spaces or quotes around the value.
The REST API is available at /api/v1/. Key endpoints:
| Method | Path | Description |
|---|---|---|
GET |
/api/v1/health |
Health check + version |
GET |
/api/v1/overview |
Cluster overview (all clusters with health + alerts) |
GET |
/api/v1/clusters |
List configured clusters |
POST |
/api/v1/clusters |
Add a cluster |
POST |
/api/v1/clusters/test |
Test connection and discover topology |
PUT |
/api/v1/clusters/{id} |
Update cluster config |
DELETE |
/api/v1/clusters/{id} |
Remove a cluster |
GET |
/api/v1/panels/{panel} |
Dashboard panel data |
GET |
/api/v1/alerts/rules |
List alert rules |
POST |
/api/v1/alerts/rules |
Create custom alert rule |
GET |
/api/v1/alerts/history |
Alert history with filters |
GET |
/api/v1/queries/slow |
Slow queries analysis |
GET |
/api/v1/queries/failed |
Failed queries breakdown |
GET |
/api/v1/queries/parts-distribution |
Parts size distribution |
GET |
/api/v1/settings |
Application settings |
PUT |
/api/v1/settings |
Update application settings |
clustersight/
βββ api/
β βββ router.py # Main API router
β βββ clusters.py # Cluster CRUD + connection testing
β βββ overview.py # Cluster overview aggregation
β βββ panels.py # Dashboard panel data endpoints
β βββ alerts.py # Alert rules, history, lifecycle
β βββ queries.py # Query inspector endpoints
β βββ settings.py # App settings CRUD
β βββ health.py # Health check endpoint
β βββ middleware/
β βββ auth.py # Password gate middleware
βββ collector/
β βββ scheduler.py # Collection loop + tier management
β βββ scrapers.py # ClickHouse system table queries
βββ models/
β βββ tables.py # SQLAlchemy ORM models
β βββ schemas.py # Pydantic request/response schemas
β βββ database.py # Async session factory
βββ services/
β βββ alert_engine.py # Alert evaluation + firing
β βββ clickhouse.py # ClickHouse HTTP client
β βββ encryption.py # Fernet encrypt/decrypt
β βββ health.py # Health score computation
β βββ notifications.py # Slack webhook delivery
β βββ cache.py # TTL cache for panel responses
βββ retention/
β βββ cleanup.py # Daily data retention scheduler
βββ config.py # Pydantic settings (env vars)
βββ main.py # FastAPI app, lifespan startup/shutdown
frontend/src/
βββ pages/ # ClusterOverview, Dashboard, AlertRules, QueryInspector, Settings, Onboarding
βββ components/
β βββ metrics/ # HealthGauge, ActiveAlertsCard, ClusterNodesCard
β βββ panels/ # ReplicationPanel, MergePanel, DiskPanel, etc.
β βββ navigation/ # BreadcrumbNav, CommandPalette
β βββ onboarding/ # ConnectionForm, TestConnection, ClusterNaming
β βββ layout/ # Shell, TopBar, Sidebar, BottomNav
β βββ ui/ # Shared UI primitives (shadcn/ui)
βββ hooks/ # React Query hooks (useClusters, useHealthScore, etc.)
βββ lib/ # API client, routes, utilities
βββ stores/ # Zustand UI state
tests/ # pytest backend tests
tests/e2e/ # End-to-end smoke tests
| Layer | Technology |
|---|---|
| Backend | Python 3.11, FastAPI, SQLAlchemy (async), SQLite, uvicorn |
| Frontend | React 18, TypeScript, Vite, Tailwind CSS v4, TanStack Query, Zustand, Radix UI |
| Collection | httpx (async HTTP to ClickHouse :8123) |
| Database | SQLite (aiosqlite) with Alembic migrations |
| Security | Fernet symmetric encryption, SHA-256 password gate |
| Logging | structlog (JSON) |
| Charts | d3-shape, Recharts |
| Testing | pytest + pytest-anyio (backend), Vitest + Testing Library (frontend) |
# Backend
pip install pytest pytest-anyio aiosqlite
pytest
# Frontend
cd frontend
npm test
# E2E smoke test (requires running instance)
python tests/e2e/test_full_ui_data.pyMIT
Built for ClickHouse teams who are tired of flying blind.
Website Β· Docker Hub Β· Report Bug