π Missing Metrics Troubleshooting¶
π¨ Symptom: MΓ©tricas no aparecen en dashboard¶
Checklist RΓ‘pido¶
β‘ ΒΏMetric collector inicializado correctamente?
β‘ ΒΏEndpoint /metrics accesible?
β‘ ΒΏFirewall bloqueando puerto?
β‘ ΒΏPrometheus scraping el endpoint?
β‘ ΒΏTypo en metric name?
β‘ ΒΏLabels correctos?
β‘ ΒΏTiempo de retention adecuado?
π§ Debug Steps¶
1. Verificar Endpoint Local¶
# Check if metrics endpoint responds
curl http://localhost:9090/metrics
# Should return Prometheus format:
# HELP cpu_usage Current CPU usage
# TYPE cpu_usage gauge
# cpu_usage 45.2
# If empty or error:
# - Check if MetricsCollector initialized
# - Check port is not blocked
# - Check process is running
Expected output:
# HELP cpu_usage Current CPU usage percentage
# TYPE cpu_usage gauge
cpu_usage 45.2
# HELP audio_latency_ms Audio processing latency
# TYPE audio_latency_ms histogram
audio_latency_ms_bucket{le="1"} 1234
audio_latency_ms_bucket{le="5"} 5678
audio_latency_ms_sum 12345
audio_latency_ms_count 10000
2. Enable Debug Logging¶
// In main() or init():
MetricsCollector::setLogLevel(LogLevel::DEBUG);
// Should log every metric recorded:
// [DEBUG] Recording metric: cpu_usage = 45.2
// [DEBUG] Recording metric: audio_latency = 5.3ms
// [DEBUG] Exporting 15 metrics
3. Check Prometheus Targets¶
# Open Prometheus UI
http://localhost:9090
# Navigate to: Status β Targets
# Should show:
ββββββββββββββββββββββββββββββββββββββββββββ
β Endpoint β State β Last β
βββββββββββββββββββββββββΌββββββββΌβββββββββββ€
β localhost:9090/metricsβ UP β 2s ago β
ββββββββββββββββββββββββββββββββββββββββββββ
# If DOWN:
# - Check endpoint URL
# - Check scrape_interval config
# - Check network connectivity
4. Verify Metric Registration¶
// Check if metric is registered
auto& collector = MetricsCollector::instance();
// Try recording a test value
collector.recordCpuUsage(50.0f);
// Check if it appears in /metrics
curl http://localhost:9090/metrics | grep cpu_usage
# Should show: cpu_usage 50.0
5. Check Prometheus Query¶
# In Prometheus UI β Graph
# Try simple query
cpu_usage
# If returns "No data":
# - Check time range (last 5m, 1h, etc.)
# - Check if metric exists: cpu_usage{job="audiolab"}
# - Check scrape config: scrape_configs β job_name
π Common Causes & Fixes¶
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Cause β Symptom β Fix β
β βββββββββββββββββββββββββββββͺβββββββββββββββββββββͺβββββββββββββββ£
β Not initialized β No /metrics endpointβ Call init() β
β Port conflict β Connection refused β Change port β
β Metrics not registered β Empty response β Register all β
β Name collision β Wrong values β Unique names β
β Label mismatch β Series not found β Check labels β
β Time range issue β "No data" β Adjust range β
β Scrape interval too long β Delayed updates β Reduce intervalβ
β Firewall blocking β Timeout β Open port β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Diagnostic Commands¶
Check if metrics are being recorded¶
# Watch metrics endpoint live
watch -n 1 'curl -s http://localhost:9090/metrics | grep cpu_usage'
# Should update every second with new values
Check Prometheus scrape status¶
# Check last scrape time
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[0].lastScrape'
# Check scrape errors
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[0].lastError'
Verify time series exists¶
# Query Prometheus API
curl 'http://localhost:9090/api/v1/query?query=cpu_usage' | jq
# Should return:
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {"__name__": "cpu_usage", "instance": "localhost:9090"},
"value": [1234567890, "45.2"]
}
]
}
}
π Advanced Debugging¶
Metric Not Appearing in Prometheus¶
Possible causes:
-
Metric name format issue
-
Label cardinality explosion
-
Metric type mismatch
Histogram Not Showing Percentiles¶
# β Missing bucket labels
histogram_quantile(0.95, audio_latency)
# β
Correct query with bucket labels
histogram_quantile(0.95, rate(audio_latency_bucket[5m]))
# Check buckets exist:
audio_latency_bucket
# Should show multiple le (less-than-or-equal) labels
Missing Data for Specific Time Range¶
Causes: - Prometheus was down during that period - Retention period expired - Scrape failed for that interval
Fix:
π οΈ Troubleshooting Tools¶
1. Prometheus Console¶
# Check if metric exists
{__name__=~"cpu.*"}
# Check all labels for metric
cpu_usage{job!=""}
# Check scrape targets
up{job="audiolab"}
2. Metrics Exporter Test¶
// Add test endpoint
void testMetrics() {
auto& collector = MetricsCollector::instance();
// Record test values
collector.recordCpuUsage(50.0f);
collector.recordMemoryUsage(1024 * 1024 * 100); // 100 MB
collector.recordBufferUnderrun();
// Export to console
std::cout << collector.exportPrometheus() << std::endl;
}
3. Network Diagnostics¶
# Check if port is open
netstat -an | grep 9090
# Check if service responds
telnet localhost 9090
GET /metrics HTTP/1.1
# Check firewall
sudo iptables -L | grep 9090
# On Windows:
netsh advfirewall firewall show rule name=all | findstr 9090
π Debugging Checklist¶
Application Side¶
β‘ MetricsCollector::instance() called
β‘ Metrics registered in constructor
β‘ recordMetric() called correctly
β‘ No exceptions thrown
β‘ /metrics endpoint implemented
β‘ Port not in use by other process
β‘ Metrics export working (test with curl)
Prometheus Side¶
β‘ prometheus.yml configured correctly
β‘ scrape_configs has correct job
β‘ targets show as UP in Status page
β‘ scrape_interval reasonable (15s)
β‘ no authentication/TLS issues
β‘ retention period long enough
β‘ storage not full
Query Side¶
β‘ Metric name spelled correctly
β‘ Labels match exactly
β‘ Time range includes data
β‘ Query syntax valid
β‘ Aggregation functions used correctly
β‘ No label cardinality issues
π¨ Emergency Fixes¶
Metrics Stopped Working Suddenly¶
# 1. Restart Prometheus
sudo systemctl restart prometheus
# 2. Check logs
sudo journalctl -u prometheus -n 100
# 3. Verify config
promtool check config /etc/prometheus/prometheus.yml
# 4. Clear bad data (if corrupted)
rm -rf /var/lib/prometheus/data/*
# WARNING: This deletes all historical data!
Metrics Delayed by Minutes¶
Cause: Long scrape interval or slow network
Fix:
# Reduce scrape interval
scrape_configs:
- job_name: 'audiolab'
scrape_interval: 5s # Was 60s
scrape_timeout: 3s
Missing Metrics After Deployment¶
Cause: Metric names changed or not registered
Fix:
// Add migration logic
void migrateMetrics() {
// Keep old metric for compatibility
recordMetric("old_cpu_usage", cpu);
// Add new metric
recordMetric("cpu_usage_percent", cpu);
}
π Getting Help¶
Information to Collect¶
# 1. Application logs
tail -n 100 logs/audiolab.log
# 2. Prometheus config
cat /etc/prometheus/prometheus.yml
# 3. Prometheus targets status
curl http://localhost:9090/api/v1/targets | jq
# 4. Recent metrics
curl http://localhost:9090/metrics | tail -n 50
# 5. System info
uname -a
free -h
df -h
Support Checklist¶
When asking for help, provide: - Application version - Prometheus version - Metric name and query - Screenshots of Prometheus UI - Relevant logs - Config files (redacted) - Steps to reproduce