Skip to content

Clustersight-io/Clustersight

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

43 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ClusterSight

ClusterSight

ClickHouse observability in under 8 minutes. No agents.

A self-hosted monitoring dashboard that tells you what to fix, not just what's broken. Connect any ClickHouse cluster with a read-only user and get a 0–100 health score, smart alerts with SQL fix commands, query performance analysis, and multi-cluster support β€” all from a single Docker container.

Docker Image License: MIT

Getting Started Β· Features Β· Documentation Β· API Reference


πŸ“Έ Screenshots

Cluster Overview Cluster overview with health scores, alert counts, and collector status across all monitored clusters.

Dashboard 10-panel health dashboard with cluster health score, replication, merge performance, disk usage, and more.

Alert with Fix Command Smart alerts include root cause context and a copy-paste SQL fix command.

Query Inspector Slow queries with P50/P95/P99 percentiles, failed query breakdown, and parts distribution.


✨ Features

  • 🎯 10-panel health dashboard β€” cluster health score (0–100 with letter grade), replication status, merge performance, disk usage, mutations, broken parts, keeper nodes, compression analysis, and error trending
  • πŸ”” Smart alerting β€” 10 built-in alert rules with Slack delivery, each including root cause context and a SQL fix command (e.g., SYSTEM RESTART REPLICA, KILL MUTATION)
  • βš™οΈ Custom alert rules β€” create your own rules with configurable thresholds, severity, cooldown, and Slack channel overrides
  • πŸ” Query inspector β€” top queries by CPU/memory/duration with EXPLAIN on click, failed query breakdown by exception code, and parts size distribution
  • 🌐 Multi-cluster support β€” monitor multiple ClickHouse clusters from a single instance with a cluster overview page and one-click switching
  • 🚫 Zero-agent collection β€” connects via the ClickHouse HTTP API (:8123), no software installed on monitored nodes
  • πŸ§™ Onboarding wizard β€” guided setup to connect a cluster, discover topology, and configure Keeper nodes
  • ⌨️ Command palette β€” Cmd+K / Ctrl+K for quick navigation across clusters and pages
  • πŸ”’ Password protection β€” optional password gate for safe exposure beyond localhost
  • πŸ” Encrypted credentials β€” cluster passwords and Slack webhooks stored with Fernet encryption
  • πŸ—‘οΈ Data retention β€” configurable automatic cleanup of metric snapshots, health scores, and alert history

πŸ“– Documentation

  • πŸ“˜ Operator Runbook β€” Installation, configuration, upgrade, backup, and troubleshooting for self-hosted deployments
  • πŸ“— User Guide β€” Dashboard panels, health scoring, alerts, query inspector, and command palette explained
  • πŸ“™ API Reference β€” Full REST API documentation with authentication, endpoint details, and curl examples

πŸ“Š What's Monitored

Dashboard Panel ClickHouse System Tables What It Shows
🎯 Health Score All panels combined 0–100 score with letter grade (A–F)
πŸ”„ Replication Status system.replicas, system.replication_queue Per-table delay, queue size, read-only state
πŸ“Š Merge Performance system.merges, system.parts Active merges, throughput, backlog estimate
πŸ’Ύ Disk Usage system.disks Per-disk free space and usage percentage
βš™οΈ Mutations system.mutations Active mutations, stuck detection (>1hr)
πŸ”΄ Broken Parts system.detached_parts Detached and broken part counts
πŸ”— Keeper Nodes TCP health check Per-node keeper connectivity status
πŸ“¦ Compression system.parts_columns Per-table compression ratios, flags ratio <2.0
πŸ“ˆ Error Trending system.errors Time-series error rates with severity classification
πŸ” Query Performance system.query_log Top queries by duration, P50/P95/P99 percentiles

πŸš€ Quick Start

Docker Compose (recommended)

curl -O https://raw.githubusercontent.com/Clustersight-io/Clustersight/master/docker-compose.yml
docker compose up -d

Open http://localhost:3001 and follow the onboarding wizard to connect your ClickHouse cluster. Time to first dashboard: ~5 minutes.

Note: The default port is 3001. To change it, edit the ports line in docker-compose.yml (e.g., "8080:3000"). The image is pulled from Docker Hub.

From Source

git clone https://github.com/Clustersight-io/Clustersight.git
cd Clustersight
# Edit docker-compose.yml: uncomment "build: ." and comment out "image: ..."
docker compose up -d

πŸ› οΈ Local Development

Prerequisites: Python 3.11+, Node.js 20+

# Backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
.venv/bin/uvicorn clustersight.main:app --port 8000 --reload

# Frontend (separate terminal)
cd frontend
npm install
npm run dev      # Vite dev server on :5173 with HMR

The React dev server on :5173 is your entry point for local development β€” it proxies API calls to the backend on :8000 automatically.


βš™οΈ Configuration

Copy .env.example to .env and edit as needed. All variables are optional β€” sensible defaults are used.

Variable Default Description
CLUSTERSIGHT_PASSWORD (unset) Set to enable password protection. When set, all API requests require a Bearer token (SHA-256 of this value). Leave blank for local development (no auth).
FERNET_KEY (auto-generated) Fernet encryption key for cluster passwords and Slack webhook URLs. Auto-generated to .key file on first run. Set explicitly if you need a specific key.
DATABASE_PATH /app/data/clustersight.db SQLite database file location
LOG_LEVEL INFO Logging level (DEBUG, INFO, WARNING, ERROR)
COLLECTION_INTERVAL 30 Seconds between collection cycles
CACHE_TTL 30 API response cache TTL in seconds
CORS_ORIGINS http://localhost:5173,http://localhost:3000 Allowed CORS origins (comma-separated)
SLACK_WEBHOOK_URL (unset) Slack incoming webhook URL for alert delivery. Also configurable via the Settings page in the UI.
APP_URL http://localhost:3000 Base URL used in Slack notification links. For Docker, set to your external host URL (e.g., http://localhost:3001 with default port mapping).

Cluster credentials are entered via the onboarding wizard and stored encrypted in SQLite.

πŸ”’ Password Protection

When CLUSTERSIGHT_PASSWORD is set:

  • The app shows a full-screen password prompt on first load
  • All API requests require Authorization: Bearer <sha256-of-password>
  • The token is stored in sessionStorage (cleared when the tab closes)
  • If the password changes (container restart with new env var), the old token stops working and the prompt re-appears automatically

When CLUSTERSIGHT_PASSWORD is not set:

  • The app runs in dev mode with no authentication
  • A warning is logged on startup
# docker-compose.yml
environment:
  - CLUSTERSIGHT_PASSWORD=your_secure_password

🐘 ClickHouse User Setup

ClusterSight requires a read-only user with SELECT on system tables. Run this on your cluster:

CREATE USER IF NOT EXISTS clustersight_ro
    IDENTIFIED BY 'your_strong_password'
    DEFAULT DATABASE system;

GRANT SELECT ON system.*             TO clustersight_ro;
GRANT SELECT ON information_schema.* TO clustersight_ro;

-- Optional: resource limits to cap collector overhead
ALTER USER clustersight_ro
    SETTINGS max_execution_time = 10, max_memory_usage = 100000000;

For replicated clusters, add ON CLUSTER 'your_cluster_name' to each statement.


πŸ’Ύ Data Persistence

All application data is stored in /app/data/ inside the container, mapped to ./data on the host via the Docker volume mount.

What is persisted:

  • Cluster configurations (encrypted passwords)
  • Alert history and rule customizations
  • Health scores and collection snapshots
  • Fernet encryption key (.key file)
  • SQLite database (clustersight.db)

Backup:

cp data/clustersight.db data/clustersight.db.bak
cp data/.key data/.key.bak    # encryption key β€” required to read encrypted passwords

⚠️ The ./data directory must be volume-mounted. Without it, all data is lost on container restart.


⬆️ Upgrading

docker compose pull
docker compose up -d
  • Migrations run automatically β€” Alembic database migrations execute on startup. No manual steps needed.
  • Preserve your data volume β€” Ensure ./data:/app/data remains mounted. This directory contains your SQLite database and Fernet encryption key.
  • FERNET_KEY β€” If you set FERNET_KEY explicitly (instead of using the auto-generated .key file), ensure the same key is set after upgrade. Losing the key means encrypted passwords and webhook URLs become unreadable.

πŸ”§ Troubleshooting

1. Can't connect to ClickHouse

Symptom: "Connection failed" during onboarding.

Fix: Verify the ClickHouse HTTP API is reachable:

curl http://your-clickhouse-host:8123/ping
# Should return "Ok."

Check firewall rules, ensure port 8123 is open, and verify the host is accessible from the machine running ClusterSight.

2. system.* permission denied

Symptom: Dashboard panels show errors or empty data.

Fix: The read-only user is missing GRANT permissions. Run:

GRANT SELECT ON system.*             TO clustersight_ro;
GRANT SELECT ON information_schema.* TO clustersight_ro;
3. Fernet key lost after restart

Symptom: "DecryptionError" in logs, cluster connections fail.

Fix: The .key file was not persisted. Ensure the data volume is mounted:

volumes:
  - ./data:/app/data   # Persists database + encryption key

If the key is already lost, delete the affected clusters and re-add them with fresh credentials.

4. Port conflict

Symptom: Container fails to start or port already in use.

Fix: Remap the port in docker-compose.yml:

ports:
  - "8080:3000"   # Change 8080 to any available port
5. Auth prompt won't accept password

Symptom: Password prompt keeps re-appearing after entering the correct password.

Fix: The CLUSTERSIGHT_PASSWORD env var must contain the plaintext password (not a hash). ClusterSight hashes it internally with SHA-256. Check:

docker compose exec clustersight printenv CLUSTERSIGHT_PASSWORD

If using .env file, ensure there are no trailing spaces or quotes around the value.


πŸ”Œ API

The REST API is available at /api/v1/. Key endpoints:

Method Path Description
GET /api/v1/health Health check + version
GET /api/v1/overview Cluster overview (all clusters with health + alerts)
GET /api/v1/clusters List configured clusters
POST /api/v1/clusters Add a cluster
POST /api/v1/clusters/test Test connection and discover topology
PUT /api/v1/clusters/{id} Update cluster config
DELETE /api/v1/clusters/{id} Remove a cluster
GET /api/v1/panels/{panel} Dashboard panel data
GET /api/v1/alerts/rules List alert rules
POST /api/v1/alerts/rules Create custom alert rule
GET /api/v1/alerts/history Alert history with filters
GET /api/v1/queries/slow Slow queries analysis
GET /api/v1/queries/failed Failed queries breakdown
GET /api/v1/queries/parts-distribution Parts size distribution
GET /api/v1/settings Application settings
PUT /api/v1/settings Update application settings

πŸ—οΈ Project Structure

clustersight/
β”œβ”€β”€ api/
β”‚   β”œβ”€β”€ router.py          # Main API router
β”‚   β”œβ”€β”€ clusters.py        # Cluster CRUD + connection testing
β”‚   β”œβ”€β”€ overview.py        # Cluster overview aggregation
β”‚   β”œβ”€β”€ panels.py          # Dashboard panel data endpoints
β”‚   β”œβ”€β”€ alerts.py          # Alert rules, history, lifecycle
β”‚   β”œβ”€β”€ queries.py         # Query inspector endpoints
β”‚   β”œβ”€β”€ settings.py        # App settings CRUD
β”‚   β”œβ”€β”€ health.py          # Health check endpoint
β”‚   └── middleware/
β”‚       └── auth.py        # Password gate middleware
β”œβ”€β”€ collector/
β”‚   β”œβ”€β”€ scheduler.py       # Collection loop + tier management
β”‚   └── scrapers.py        # ClickHouse system table queries
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ tables.py          # SQLAlchemy ORM models
β”‚   β”œβ”€β”€ schemas.py         # Pydantic request/response schemas
β”‚   └── database.py        # Async session factory
β”œβ”€β”€ services/
β”‚   β”œβ”€β”€ alert_engine.py    # Alert evaluation + firing
β”‚   β”œβ”€β”€ clickhouse.py      # ClickHouse HTTP client
β”‚   β”œβ”€β”€ encryption.py      # Fernet encrypt/decrypt
β”‚   β”œβ”€β”€ health.py          # Health score computation
β”‚   β”œβ”€β”€ notifications.py   # Slack webhook delivery
β”‚   └── cache.py           # TTL cache for panel responses
β”œβ”€β”€ retention/
β”‚   └── cleanup.py         # Daily data retention scheduler
β”œβ”€β”€ config.py              # Pydantic settings (env vars)
└── main.py                # FastAPI app, lifespan startup/shutdown

frontend/src/
β”œβ”€β”€ pages/                 # ClusterOverview, Dashboard, AlertRules, QueryInspector, Settings, Onboarding
β”œβ”€β”€ components/
β”‚   β”œβ”€β”€ metrics/           # HealthGauge, ActiveAlertsCard, ClusterNodesCard
β”‚   β”œβ”€β”€ panels/            # ReplicationPanel, MergePanel, DiskPanel, etc.
β”‚   β”œβ”€β”€ navigation/        # BreadcrumbNav, CommandPalette
β”‚   β”œβ”€β”€ onboarding/        # ConnectionForm, TestConnection, ClusterNaming
β”‚   β”œβ”€β”€ layout/            # Shell, TopBar, Sidebar, BottomNav
β”‚   └── ui/                # Shared UI primitives (shadcn/ui)
β”œβ”€β”€ hooks/                 # React Query hooks (useClusters, useHealthScore, etc.)
β”œβ”€β”€ lib/                   # API client, routes, utilities
└── stores/                # Zustand UI state

tests/                     # pytest backend tests
tests/e2e/                 # End-to-end smoke tests

🧰 Tech Stack

Layer Technology
Backend Python 3.11, FastAPI, SQLAlchemy (async), SQLite, uvicorn
Frontend React 18, TypeScript, Vite, Tailwind CSS v4, TanStack Query, Zustand, Radix UI
Collection httpx (async HTTP to ClickHouse :8123)
Database SQLite (aiosqlite) with Alembic migrations
Security Fernet symmetric encryption, SHA-256 password gate
Logging structlog (JSON)
Charts d3-shape, Recharts
Testing pytest + pytest-anyio (backend), Vitest + Testing Library (frontend)

πŸ§ͺ Running Tests

# Backend
pip install pytest pytest-anyio aiosqlite
pytest

# Frontend
cd frontend
npm test

# E2E smoke test (requires running instance)
python tests/e2e/test_full_ui_data.py

πŸ“„ License

MIT


Built for ClickHouse teams who are tired of flying blind.

Website Β· Docker Hub Β· Report Bug

About

ClickHouse monitoring that tells you what to fix, not just what's broken. Health score, smart alerts with SQL fix commands, multi-cluster support. Self-hosted, zero agents, deploy in 8 minutes

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors