Skip to content

ThanhTamSys/cloud_monitoring_stack

Repository files navigation

📊🔔 Cloud Monitoring Stack: Prometheus + Grafana + Telegram Alerts

Config Validation CI

A production-grade Cloud Monitoring & Alerting environment, containerized with Docker Compose. Automatically collects metrics from applications and infrastructure, visualizes them in real-time Grafana Dashboards, and fires alerts to Telegram — identical to enterprise DevOps/CloudOps setups.

Portfolio goal: Demonstrate hands-on experience with Prometheus, Grafana, Alertmanager, and automated incident response pipelines.


🎨 User Interface Preview

Grafana Dark Mode Dashboard (Giao diện Giám sát)

The Grafana dashboard visualizes CPU, Memory usage, container statistics, and HTTP traffic metrics in real-time: Grafana Dashboard Preview


🌟 Features

Service Port Description
Prometheus 9090 Time-series metrics scraper (CPU, RAM, app errors)
Grafana 3000 Real-time dashboards with line charts and status panels
cAdvisor 8082 Docker container resource metrics (CPU%, Memory, Network I/O)
Alertmanager 9093 Routes alerts → Telegram when thresholds are breached
Sample App 8080 Python app exposing /metrics, /simulate-error, /slow

🏗️ Architecture

graph TD
    App["🐍 Python Sample App\n/metrics endpoint"] -->|scrape| Prom["📈 Prometheus\n:9090"]
    cAdv["🐳 cAdvisor\nContainer metrics"] -->|scrape| Prom
    Prom -->|evaluate rules| Alert["🚨 Alertmanager\n:9093"]
    Alert -->|HTTP POST| Telegram["📱 Telegram Bot"]
    Prom -->|data source| Grafana["📊 Grafana\n:3000"]
Loading

🛠️ Requirements


🚀 Quick Start

Step 1: Configure Telegram Alerts

Copy the environment template:

cp .env.example .env

Edit .env and fill in your Telegram credentials:

TELEGRAM_BOT_TOKEN=your_bot_token_here
TELEGRAM_CHAT_ID=your_chat_id_here

Then update alertmanager/alertmanager.yml with those values.

Step 2: Start the full stack

docker compose up -d

Step 3: Access services

Service URL Login
Grafana http://localhost:3000 admin / admin
Prometheus http://localhost:9090
cAdvisor http://localhost:8082
Sample App http://localhost:8080

Step 4: Trigger a test alert

# Hit the error endpoint multiple times to trigger Alertmanager
curl http://localhost:8080/simulate-error
curl http://localhost:8080/slow

Check your Telegram — you should receive an alert within 30 seconds.


📁 Project Structure

15_cloud_monitoring_stack/
├── .github/workflows/ci.yml         # Config validation CI
├── docker-compose.yml               # Orchestrates all 5 services
├── prometheus/
│   ├── prometheus.yml               # Scrape targets configuration
│   └── alert_rules.yml              # Alerting rules (CPU, errors)
├── grafana/
│   └── provisioning/
│       ├── datasources/             # Auto-connects Prometheus
│       └── dashboards/              # Pre-built dashboard JSON
├── alertmanager/
│   └── alertmanager.yml             # Telegram routing config
├── app/
│   ├── app.py                       # Python metrics app
│   ├── requirements.txt
│   └── Dockerfile
├── .env.example                     # Environment variable template
└── README_VI.md                     # Vietnamese documentation

⚙️ Alert Rules (prometheus/alert_rules.yml)

Pre-configured alerts trigger when:

  • HighErrorRate: App error rate > 20% for 1 minute
  • SlowResponseTime: p95 latency > 2 seconds for 2 minutes

🔒 Security Note

Never commit real tokens to Git. Use .env files and add them to .gitignore.

About

Production monitoring stack: Prometheus + Grafana + cAdvisor + Alertmanager → Telegram notifications.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors