Skip to content

πŸ” Missing Metrics Troubleshooting

🚨 Symptom: Métricas no aparecen en dashboard

Checklist RΓ‘pido

β–‘ ΒΏMetric collector inicializado correctamente?
β–‘ ΒΏEndpoint /metrics accesible?
β–‘ ΒΏFirewall bloqueando puerto?
β–‘ ΒΏPrometheus scraping el endpoint?
β–‘ ΒΏTypo en metric name?
β–‘ ΒΏLabels correctos?
β–‘ ΒΏTiempo de retention adecuado?

πŸ”§ Debug Steps

1. Verificar Endpoint Local

# Check if metrics endpoint responds
curl http://localhost:9090/metrics

# Should return Prometheus format:
# HELP cpu_usage Current CPU usage
# TYPE cpu_usage gauge
# cpu_usage 45.2

# If empty or error:
# - Check if MetricsCollector initialized
# - Check port is not blocked
# - Check process is running

Expected output:

# HELP cpu_usage Current CPU usage percentage
# TYPE cpu_usage gauge
cpu_usage 45.2

# HELP audio_latency_ms Audio processing latency
# TYPE audio_latency_ms histogram
audio_latency_ms_bucket{le="1"} 1234
audio_latency_ms_bucket{le="5"} 5678
audio_latency_ms_sum 12345
audio_latency_ms_count 10000

2. Enable Debug Logging

// In main() or init():
MetricsCollector::setLogLevel(LogLevel::DEBUG);

// Should log every metric recorded:
// [DEBUG] Recording metric: cpu_usage = 45.2
// [DEBUG] Recording metric: audio_latency = 5.3ms
// [DEBUG] Exporting 15 metrics

3. Check Prometheus Targets

# Open Prometheus UI
http://localhost:9090

# Navigate to: Status β†’ Targets
# Should show:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Endpoint              β”‚ State β”‚ Last     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ localhost:9090/metricsβ”‚ UP    β”‚ 2s ago   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

# If DOWN:
# - Check endpoint URL
# - Check scrape_interval config
# - Check network connectivity

4. Verify Metric Registration

// Check if metric is registered
auto& collector = MetricsCollector::instance();

// Try recording a test value
collector.recordCpuUsage(50.0f);

// Check if it appears in /metrics
curl http://localhost:9090/metrics | grep cpu_usage
# Should show: cpu_usage 50.0

5. Check Prometheus Query

# In Prometheus UI β†’ Graph

# Try simple query
cpu_usage

# If returns "No data":
# - Check time range (last 5m, 1h, etc.)
# - Check if metric exists: cpu_usage{job="audiolab"}
# - Check scrape config: scrape_configs β†’ job_name

πŸ› Common Causes & Fixes

╔════════════════════════════════════════════════════════════════╗
β•‘ Cause                      β”‚ Symptom            β”‚ Fix          β•‘
╠════════════════════════════β•ͺ════════════════════β•ͺ══════════════╣
β•‘ Not initialized            β”‚ No /metrics endpointβ”‚ Call init() β•‘
β•‘ Port conflict              β”‚ Connection refused β”‚ Change port  β•‘
β•‘ Metrics not registered     β”‚ Empty response     β”‚ Register all β•‘
β•‘ Name collision             β”‚ Wrong values       β”‚ Unique names β•‘
β•‘ Label mismatch             β”‚ Series not found   β”‚ Check labels β•‘
β•‘ Time range issue           β”‚ "No data"          β”‚ Adjust range β•‘
β•‘ Scrape interval too long   β”‚ Delayed updates    β”‚ Reduce intervalβ•‘
β•‘ Firewall blocking          β”‚ Timeout            β”‚ Open port    β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

πŸ“Š Diagnostic Commands

Check if metrics are being recorded

# Watch metrics endpoint live
watch -n 1 'curl -s http://localhost:9090/metrics | grep cpu_usage'

# Should update every second with new values

Check Prometheus scrape status

# Check last scrape time
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[0].lastScrape'

# Check scrape errors
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[0].lastError'

Verify time series exists

# Query Prometheus API
curl 'http://localhost:9090/api/v1/query?query=cpu_usage' | jq

# Should return:
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {"__name__": "cpu_usage", "instance": "localhost:9090"},
        "value": [1234567890, "45.2"]
      }
    ]
  }
}

πŸ” Advanced Debugging

Metric Not Appearing in Prometheus

Possible causes:

  1. Metric name format issue

    // ❌ Invalid: contains uppercase
    metrics.gauge("CPU_Usage").set(50.0);
    
    // βœ… Valid: snake_case
    metrics.gauge("cpu_usage").set(50.0);
    

  2. Label cardinality explosion

    // ❌ Too many unique label values
    metrics.counter("requests")
        .labels({{"user_id", "12345"}});  // Millions of series
    
    // βœ… Limited label values
    metrics.counter("requests")
        .labels({{"endpoint", "/api"}});  // Few series
    

  3. Metric type mismatch

    # ❌ Trying to query counter as gauge
    cpu_usage_total  # Counter, needs rate()
    
    # βœ… Correct query
    rate(cpu_usage_total[5m])
    

Histogram Not Showing Percentiles

# ❌ Missing bucket labels
histogram_quantile(0.95, audio_latency)

# βœ… Correct query with bucket labels
histogram_quantile(0.95, rate(audio_latency_bucket[5m]))

# Check buckets exist:
audio_latency_bucket
# Should show multiple le (less-than-or-equal) labels

Missing Data for Specific Time Range

Causes: - Prometheus was down during that period - Retention period expired - Scrape failed for that interval

Fix:

# Increase retention
storage:
  tsdb:
    retention:
      time: 30d  # Was 7d

πŸ› οΈ Troubleshooting Tools

1. Prometheus Console

# Check if metric exists
{__name__=~"cpu.*"}

# Check all labels for metric
cpu_usage{job!=""}

# Check scrape targets
up{job="audiolab"}

2. Metrics Exporter Test

// Add test endpoint
void testMetrics() {
    auto& collector = MetricsCollector::instance();

    // Record test values
    collector.recordCpuUsage(50.0f);
    collector.recordMemoryUsage(1024 * 1024 * 100);  // 100 MB
    collector.recordBufferUnderrun();

    // Export to console
    std::cout << collector.exportPrometheus() << std::endl;
}

3. Network Diagnostics

# Check if port is open
netstat -an | grep 9090

# Check if service responds
telnet localhost 9090
GET /metrics HTTP/1.1

# Check firewall
sudo iptables -L | grep 9090

# On Windows:
netsh advfirewall firewall show rule name=all | findstr 9090

πŸ“‹ Debugging Checklist

Application Side

β–‘ MetricsCollector::instance() called
β–‘ Metrics registered in constructor
β–‘ recordMetric() called correctly
β–‘ No exceptions thrown
β–‘ /metrics endpoint implemented
β–‘ Port not in use by other process
β–‘ Metrics export working (test with curl)

Prometheus Side

β–‘ prometheus.yml configured correctly
β–‘ scrape_configs has correct job
β–‘ targets show as UP in Status page
β–‘ scrape_interval reasonable (15s)
β–‘ no authentication/TLS issues
β–‘ retention period long enough
β–‘ storage not full

Query Side

β–‘ Metric name spelled correctly
β–‘ Labels match exactly
β–‘ Time range includes data
β–‘ Query syntax valid
β–‘ Aggregation functions used correctly
β–‘ No label cardinality issues

🚨 Emergency Fixes

Metrics Stopped Working Suddenly

# 1. Restart Prometheus
sudo systemctl restart prometheus

# 2. Check logs
sudo journalctl -u prometheus -n 100

# 3. Verify config
promtool check config /etc/prometheus/prometheus.yml

# 4. Clear bad data (if corrupted)
rm -rf /var/lib/prometheus/data/*
# WARNING: This deletes all historical data!

Metrics Delayed by Minutes

Cause: Long scrape interval or slow network

Fix:

# Reduce scrape interval
scrape_configs:
  - job_name: 'audiolab'
    scrape_interval: 5s  # Was 60s
    scrape_timeout: 3s

Missing Metrics After Deployment

Cause: Metric names changed or not registered

Fix:

// Add migration logic
void migrateMetrics() {
    // Keep old metric for compatibility
    recordMetric("old_cpu_usage", cpu);

    // Add new metric
    recordMetric("cpu_usage_percent", cpu);
}

πŸ“ž Getting Help

Information to Collect

# 1. Application logs
tail -n 100 logs/audiolab.log

# 2. Prometheus config
cat /etc/prometheus/prometheus.yml

# 3. Prometheus targets status
curl http://localhost:9090/api/v1/targets | jq

# 4. Recent metrics
curl http://localhost:9090/metrics | tail -n 50

# 5. System info
uname -a
free -h
df -h

Support Checklist

When asking for help, provide: - Application version - Prometheus version - Metric name and query - Screenshots of Prometheus UI - Relevant logs - Config files (redacted) - Steps to reproduce