Prometheus Metrics Dashboard

Basic Metrics

Service Health

Check if items-service is up and being scraped

up{job="items-service"}
Open

Total HTTP Requests

Total number of HTTP requests since startup

http_server_requests_total
Open

Requests by Endpoint

HTTP requests grouped by route

http_server_requests_total{route="/v1/items"}
Open

Request Rate (RPS)

Overall Request Rate

Requests per second over the last 5 minutes

rate(http_server_requests_total[5m])
Open

Request Rate by Endpoint

RPS grouped by route and method

sum by(route, method) (rate(http_server_requests_total[5m]))
Open

Request Rate by Status Code

RPS grouped by HTTP status code

sum by(status_code) (rate(http_server_requests_total[5m]))
Open

Latency Metrics

Average Response Time

Mean request duration in milliseconds

rate(http_server_duration_sum[5m]) / rate(http_server_duration_count[5m])
Open

P95 Latency

95th percentile response time

histogram_quantile(0.95, rate(http_server_duration_bucket[5m]))
Open

P99 Latency

99th percentile response time

histogram_quantile(0.99, rate(http_server_duration_bucket[5m]))
Open

P50 Latency (Median)

50th percentile response time

histogram_quantile(0.50, rate(http_server_duration_bucket[5m]))
Open

Error Monitoring

Error Rate (5xx)

Rate of server errors

rate(http_server_requests_total{status_code=~"5.."}[5m])
Open

Client Error Rate (4xx)

Rate of client errors

rate(http_server_requests_total{status_code=~"4.."}[5m])
Open

Error Percentage

Percentage of requests that failed

sum(rate(http_server_requests_total{status_code=~"5.."}[5m])) / sum(rate(http_server_requests_total[5m])) * 100
Open

Generate Test Traffic

Before viewing metrics, generate some traffic:

for i in {1..50}; do
  curl -s https://app.roussev.com/items/v1/health > /dev/null
  curl -s https://app.roussev.com/items/v1/items > /dev/null
  sleep 0.5
done