🔍 Missing Metrics Troubleshooting¶

🚨 Symptom: Métricas no aparecen en dashboard¶

Checklist Rápido¶

□ ¿Metric collector inicializado correctamente?
□ ¿Endpoint /metrics accesible?
□ ¿Firewall bloqueando puerto?
□ ¿Prometheus scraping el endpoint?
□ ¿Typo en metric name?
□ ¿Labels correctos?
□ ¿Tiempo de retention adecuado?

🔧 Debug Steps¶

1. Verificar Endpoint Local¶

# Check if metrics endpoint responds
curl http://localhost:9090/metrics

# Should return Prometheus format:
# HELP cpu_usage Current CPU usage
# TYPE cpu_usage gauge
# cpu_usage 45.2

# If empty or error:
# - Check if MetricsCollector initialized
# - Check port is not blocked
# - Check process is running

Expected output:

# HELP cpu_usage Current CPU usage percentage
# TYPE cpu_usage gauge
cpu_usage 45.2

# HELP audio_latency_ms Audio processing latency
# TYPE audio_latency_ms histogram
audio_latency_ms_bucket{le="1"} 1234
audio_latency_ms_bucket{le="5"} 5678
audio_latency_ms_sum 12345
audio_latency_ms_count 10000

2. Enable Debug Logging¶

// In main() or init():
MetricsCollector::setLogLevel(LogLevel::DEBUG);

// Should log every metric recorded:
// [DEBUG] Recording metric: cpu_usage = 45.2
// [DEBUG] Recording metric: audio_latency = 5.3ms
// [DEBUG] Exporting 15 metrics

3. Check Prometheus Targets¶

# Open Prometheus UI
http://localhost:9090

# Navigate to: Status → Targets
# Should show:
┌──────────────────────────────────────────┐
│ Endpoint              │ State │ Last     │
├───────────────────────┼───────┼──────────┤
│ localhost:9090/metrics│ UP    │ 2s ago   │
└──────────────────────────────────────────┘

# If DOWN:
# - Check endpoint URL
# - Check scrape_interval config
# - Check network connectivity

4. Verify Metric Registration¶

// Check if metric is registered
auto& collector = MetricsCollector::instance();

// Try recording a test value
collector.recordCpuUsage(50.0f);

// Check if it appears in /metrics
curl http://localhost:9090/metrics | grep cpu_usage
# Should show: cpu_usage 50.0

5. Check Prometheus Query¶

# In Prometheus UI → Graph

# Try simple query
cpu_usage

# If returns "No data":
# - Check time range (last 5m, 1h, etc.)
# - Check if metric exists: cpu_usage{job="audiolab"}
# - Check scrape config: scrape_configs → job_name

🐛 Common Causes & Fixes¶

╔════════════════════════════════════════════════════════════════╗
║ Cause                      │ Symptom            │ Fix          ║
╠════════════════════════════╪════════════════════╪══════════════╣
║ Not initialized            │ No /metrics endpoint│ Call init() ║
║ Port conflict              │ Connection refused │ Change port  ║
║ Metrics not registered     │ Empty response     │ Register all ║
║ Name collision             │ Wrong values       │ Unique names ║
║ Label mismatch             │ Series not found   │ Check labels ║
║ Time range issue           │ "No data"          │ Adjust range ║
║ Scrape interval too long   │ Delayed updates    │ Reduce interval║
║ Firewall blocking          │ Timeout            │ Open port    ║
╚════════════════════════════════════════════════════════════════╝

📊 Diagnostic Commands¶

Check if metrics are being recorded¶

# Watch metrics endpoint live
watch -n 1 'curl -s http://localhost:9090/metrics | grep cpu_usage'

# Should update every second with new values

Check Prometheus scrape status¶

# Check last scrape time
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[0].lastScrape'

# Check scrape errors
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[0].lastError'

Verify time series exists¶

# Query Prometheus API
curl 'http://localhost:9090/api/v1/query?query=cpu_usage' | jq

# Should return:
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {"__name__": "cpu_usage", "instance": "localhost:9090"},
        "value": [1234567890, "45.2"]
      }
    ]
  }
}

🔍 Advanced Debugging¶

Metric Not Appearing in Prometheus¶

Possible causes:

Metric name format issue

// ❌ Invalid: contains uppercase
metrics.gauge("CPU_Usage").set(50.0);

// ✅ Valid: snake_case
metrics.gauge("cpu_usage").set(50.0);

Label cardinality explosion

// ❌ Too many unique label values
metrics.counter("requests")
    .labels({{"user_id", "12345"}});  // Millions of series

// ✅ Limited label values
metrics.counter("requests")
    .labels({{"endpoint", "/api"}});  // Few series

Metric type mismatch

# ❌ Trying to query counter as gauge
cpu_usage_total  # Counter, needs rate()

# ✅ Correct query
rate(cpu_usage_total[5m])

Histogram Not Showing Percentiles¶

# ❌ Missing bucket labels
histogram_quantile(0.95, audio_latency)

# ✅ Correct query with bucket labels
histogram_quantile(0.95, rate(audio_latency_bucket[5m]))

# Check buckets exist:
audio_latency_bucket
# Should show multiple le (less-than-or-equal) labels

Missing Data for Specific Time Range¶

Causes: - Prometheus was down during that period - Retention period expired - Scrape failed for that interval

Fix:

# Increase retention
storage:
  tsdb:
    retention:
      time: 30d  # Was 7d

🛠️ Troubleshooting Tools¶

1. Prometheus Console¶

# Check if metric exists
{__name__=~"cpu.*"}

# Check all labels for metric
cpu_usage{job!=""}

# Check scrape targets
up{job="audiolab"}

2. Metrics Exporter Test¶

// Add test endpoint
void testMetrics() {
    auto& collector = MetricsCollector::instance();

    // Record test values
    collector.recordCpuUsage(50.0f);
    collector.recordMemoryUsage(1024 * 1024 * 100);  // 100 MB
    collector.recordBufferUnderrun();

    // Export to console
    std::cout << collector.exportPrometheus() << std::endl;
}

3. Network Diagnostics¶

# Check if port is open
netstat -an | grep 9090

# Check if service responds
telnet localhost 9090
GET /metrics HTTP/1.1

# Check firewall
sudo iptables -L | grep 9090

# On Windows:
netsh advfirewall firewall show rule name=all | findstr 9090

📋 Debugging Checklist¶

Application Side¶

□ MetricsCollector::instance() called
□ Metrics registered in constructor
□ recordMetric() called correctly
□ No exceptions thrown
□ /metrics endpoint implemented
□ Port not in use by other process
□ Metrics export working (test with curl)

Prometheus Side¶

□ prometheus.yml configured correctly
□ scrape_configs has correct job
□ targets show as UP in Status page
□ scrape_interval reasonable (15s)
□ no authentication/TLS issues
□ retention period long enough
□ storage not full

Query Side¶

□ Metric name spelled correctly
□ Labels match exactly
□ Time range includes data
□ Query syntax valid
□ Aggregation functions used correctly
□ No label cardinality issues

🚨 Emergency Fixes¶

Metrics Stopped Working Suddenly¶

# 1. Restart Prometheus
sudo systemctl restart prometheus

# 2. Check logs
sudo journalctl -u prometheus -n 100

# 3. Verify config
promtool check config /etc/prometheus/prometheus.yml

# 4. Clear bad data (if corrupted)
rm -rf /var/lib/prometheus/data/*
# WARNING: This deletes all historical data!

Metrics Delayed by Minutes¶

Cause: Long scrape interval or slow network

Fix:

# Reduce scrape interval
scrape_configs:
  - job_name: 'audiolab'
    scrape_interval: 5s  # Was 60s
    scrape_timeout: 3s

Missing Metrics After Deployment¶

Cause: Metric names changed or not registered

Fix:

// Add migration logic
void migrateMetrics() {
    // Keep old metric for compatibility
    recordMetric("old_cpu_usage", cpu);

    // Add new metric
    recordMetric("cpu_usage_percent", cpu);
}

📞 Getting Help¶

Information to Collect¶

# 1. Application logs
tail -n 100 logs/audiolab.log

# 2. Prometheus config
cat /etc/prometheus/prometheus.yml

# 3. Prometheus targets status
curl http://localhost:9090/api/v1/targets | jq

# 4. Recent metrics
curl http://localhost:9090/metrics | tail -n 50

# 5. System info
uname -a
free -h
df -h

Support Checklist¶

When asking for help, provide: - Application version - Prometheus version - Metric name and query - Screenshots of Prometheus UI - Relevant logs - Config files (redacted) - Steps to reproduce

🔍 Missing Metrics Troubleshooting¶

🚨 Symptom: Métricas no aparecen en dashboard¶

Checklist Rápido¶

🔧 Debug Steps¶

1. Verificar Endpoint Local¶

2. Enable Debug Logging¶

3. Check Prometheus Targets¶

4. Verify Metric Registration¶

5. Check Prometheus Query¶

🐛 Common Causes & Fixes¶

📊 Diagnostic Commands¶

Check if metrics are being recorded¶

Check Prometheus scrape status¶

Verify time series exists¶

🔍 Advanced Debugging¶

Metric Not Appearing in Prometheus¶

Histogram Not Showing Percentiles¶

Missing Data for Specific Time Range¶

🛠️ Troubleshooting Tools¶

1. Prometheus Console¶

2. Metrics Exporter Test¶

3. Network Diagnostics¶

📋 Debugging Checklist¶

Application Side¶

Prometheus Side¶

Query Side¶

🚨 Emergency Fixes¶

Metrics Stopped Working Suddenly¶

Metrics Delayed by Minutes¶

Missing Metrics After Deployment¶

📞 Getting Help¶

Information to Collect¶

Support Checklist¶

🔗 Related Docs¶