A production-grade Cloud Monitoring & Alerting environment, containerized with Docker Compose. Automatically collects metrics from applications and infrastructure, visualizes them in real-time Grafana Dashboards, and fires alerts to Telegram — identical to enterprise DevOps/CloudOps setups.
Portfolio goal: Demonstrate hands-on experience with Prometheus, Grafana, Alertmanager, and automated incident response pipelines.
The Grafana dashboard visualizes CPU, Memory usage, container statistics, and HTTP traffic metrics in real-time:

| Service | Port | Description |
|---|---|---|
| Prometheus | 9090 | Time-series metrics scraper (CPU, RAM, app errors) |
| Grafana | 3000 | Real-time dashboards with line charts and status panels |
| cAdvisor | 8082 | Docker container resource metrics (CPU%, Memory, Network I/O) |
| Alertmanager | 9093 | Routes alerts → Telegram when thresholds are breached |
| Sample App | 8080 | Python app exposing /metrics, /simulate-error, /slow |
graph TD
App["🐍 Python Sample App\n/metrics endpoint"] -->|scrape| Prom["📈 Prometheus\n:9090"]
cAdv["🐳 cAdvisor\nContainer metrics"] -->|scrape| Prom
Prom -->|evaluate rules| Alert["🚨 Alertmanager\n:9093"]
Alert -->|HTTP POST| Telegram["📱 Telegram Bot"]
Prom -->|data source| Grafana["📊 Grafana\n:3000"]
- Docker Desktop
- A Telegram Bot Token and Chat ID (create via @BotFather)
Copy the environment template:
cp .env.example .envEdit .env and fill in your Telegram credentials:
TELEGRAM_BOT_TOKEN=your_bot_token_here
TELEGRAM_CHAT_ID=your_chat_id_hereThen update alertmanager/alertmanager.yml with those values.
docker compose up -d| Service | URL | Login |
|---|---|---|
| Grafana | http://localhost:3000 | admin / admin |
| Prometheus | http://localhost:9090 | — |
| cAdvisor | http://localhost:8082 | — |
| Sample App | http://localhost:8080 | — |
# Hit the error endpoint multiple times to trigger Alertmanager
curl http://localhost:8080/simulate-error
curl http://localhost:8080/slowCheck your Telegram — you should receive an alert within 30 seconds.
15_cloud_monitoring_stack/
├── .github/workflows/ci.yml # Config validation CI
├── docker-compose.yml # Orchestrates all 5 services
├── prometheus/
│ ├── prometheus.yml # Scrape targets configuration
│ └── alert_rules.yml # Alerting rules (CPU, errors)
├── grafana/
│ └── provisioning/
│ ├── datasources/ # Auto-connects Prometheus
│ └── dashboards/ # Pre-built dashboard JSON
├── alertmanager/
│ └── alertmanager.yml # Telegram routing config
├── app/
│ ├── app.py # Python metrics app
│ ├── requirements.txt
│ └── Dockerfile
├── .env.example # Environment variable template
└── README_VI.md # Vietnamese documentation
Pre-configured alerts trigger when:
- HighErrorRate: App error rate > 20% for 1 minute
- SlowResponseTime: p95 latency > 2 seconds for 2 minutes
Never commit real tokens to Git. Use .env files and add them to .gitignore.