📊 Dashboard Design Guidelines¶
🎯 Core Principles¶
Golden Rules¶
╔════════════════════════════════════════════════════════════╗
║ DO │ DON'T ║
╠═════════════════════════════╪══════════════════════════════╣
║ Max 9 panels per dashboard │ Overload con 20+ panels ║
║ Consistent color scheme │ Rainbow vomit ║
║ Meaningful titles │ "Graph 1", "Panel 2" ║
║ Show units (ms, %, MB) │ Unitless numbers ║
║ Mobile-responsive │ Assume desktop only ║
║ Focus on actionable data │ Vanity metrics ║
║ Update every 5-10s │ Real-time (every 1s) ║
╚════════════════════════════════════════════════════════════╝
🎨 Color Scheme¶
Traffic Light Pattern¶
Green (Good):
- CPU < 50%
- Memory < 60%
- Error rate < 0.1%
- Latency < 10ms
- Status: Operational
Yellow (Warning):
- CPU 50-80%
- Memory 60-80%
- Error rate 0.1-1%
- Latency 10-50ms
- Status: Degraded
Red (Critical):
- CPU > 80%
- Memory > 80%
- Error rate > 1%
- Latency > 50ms
- Status: Outage
Standard Colors¶
Primary: #3274D9 (Blue) - Graphs, lines
Success: #73BF69 (Green) - Good metrics
Warning: #FADE2A (Yellow) - Warning thresholds
Critical: #F2495C (Red) - Critical alerts
Info: #5794F2 (Light Blue) - Info panels
Background: #111217 (Dark) - Panel background
📐 Layout Guidelines¶
Standard Dashboard Layout¶
┌─────────────────────────────────────────────────────────┐
│ DASHBOARD TITLE [Time Range] [⟳] │
├─────────────────────────────────────────────────────────┤
│ │
│ [Key Metric 1] [Key Metric 2] [Key Metric 3] │
│ (Stat Panel) (Stat Panel) (Stat Panel) │
│ 50% 128MB 0.01% │
│ │
├─────────────────────────────────────────────────────────┤
│ │
│ [Trend Graph - Full Width] │
│ ┌─────────────────────────────────────────────┐ │
│ │ │ │
│ │ ╱╲ ╱╲ │ │
│ │ ╱╲ ╱ ╲ ╱ ╲ ╱╲ │ │
│ │ ╱ ╲╱ ╲╱ ╲ ╱ ╲ │ │
│ │ ╱ ╲╱ ╲ │ │
│ └─────────────────────────────────────────────┘ │
│ │
├─────────────────────────────────────────────────────────┤
│ │
│ [Detail Panel 1] [Detail Panel 2] [Detail Panel 3] │
│ (Table/Graph) (Heatmap) (Logs) │
│ │
└─────────────────────────────────────────────────────────┘
Grid System¶
- 12-column grid: Divide width into 12 units
- Panel heights: 4, 8, 12, or 16 units
- Standard sizes:
- Full width: 12 units
- Half width: 6 units
- Third width: 4 units
- Quarter width: 3 units
Panel Positioning¶
{
"gridPos": {
"x": 0, // Horizontal position (0-12)
"y": 0, // Vertical position (starts at 0)
"w": 6, // Width (1-12)
"h": 8 // Height (typically 4, 8, 12, 16)
}
}
⏰ Time Ranges¶
Standard Presets¶
Real-time Monitoring:
- Last 5 minutes (for live issues)
- Last 15 minutes (active incident)
- Last 30 minutes (incident trend)
Investigation:
- Last 1 hour (recent behavior)
- Last 3 hours (extended pattern)
- Last 6 hours (shift analysis)
Trend Analysis:
- Last 24 hours (daily pattern)
- Last 7 days (weekly trend)
- Last 30 days (monthly overview)
Custom:
- From/To selector
- Relative: now-6h to now
Refresh Intervals¶
Production (Live):
- 5s (active incident)
- 10s (normal monitoring)
- 30s (overview dashboard)
Development:
- 30s (local testing)
- 1m (integration testing)
Historical:
- None (static analysis)
📊 Panel Types & Usage¶
1. Stat Panels (Single Values)¶
Use for: - Current value of key metrics - Percentage indicators - Binary status (Up/Down)
Configuration:
{
"type": "stat",
"options": {
"textMode": "value_and_name",
"graphMode": "area",
"colorMode": "background",
"orientation": "auto",
"reduceOptions": {
"values": false,
"calcs": ["lastNotNull"]
}
},
"fieldConfig": {
"defaults": {
"thresholds": {
"mode": "absolute",
"steps": [
{"color": "green", "value": null},
{"color": "yellow", "value": 50},
{"color": "red", "value": 80}
]
},
"unit": "percent"
}
}
}
2. Time Series Graphs¶
Use for: - Trends over time - Multiple metrics comparison - Identifying patterns
Best Practices: - Max 5 series per graph - Use legend at bottom - Enable tooltips - Add threshold lines
3. Tables¶
Use for: - Top N queries (slowest, errors) - Detailed logs - Current state of resources
Configuration: - Sort by most critical column - Limit to 10-20 rows - Color-code critical values
4. Heatmaps¶
Use for: - Latency distribution - Load patterns over time - Bucket visualization
5. Gauges¶
Use for: - Percentage metrics with clear thresholds - Capacity indicators - Progress tracking
🏷️ Naming Conventions¶
Dashboard Names¶
Format: [System] - [Purpose]
Examples:
- AudioLab - System Overview
- AudioLab - Performance Monitoring
- AudioLab - Error Analysis
- AudioLab - Infrastructure Health
Panel Titles¶
Format: [Metric] ([Unit])
Examples:
- CPU Usage (%)
- Memory Allocated (MB)
- Request Latency (ms)
- Error Rate (errors/sec)
- Active Connections (count)
Query Labels¶
Format: {service="name", environment="prod", instance="host1"}
Examples:
- {service="audio-engine", env="prod"}
- {component="reverb", type="plugin"}
📱 Mobile Responsiveness¶
Design Considerations¶
- Vertical stacking on mobile (width < 768px)
- Hide less critical panels on small screens
- Larger touch targets for interactive elements
- Simplified legends on mobile
Panel Visibility¶
{
"hideWhenNoData": true,
"transparent": false,
"options": {
"legend": {
"displayMode": "list",
"placement": "bottom",
"showLegend": true
}
}
}
🎯 Dashboard Categories¶
1. Overview Dashboard¶
Purpose: High-level system health Audience: Everyone Update: Every 10s Panels: - System status (UP/DOWN) - Key metrics (CPU, Memory, Errors) - Active alerts count - Recent events timeline
2. Performance Dashboard¶
Purpose: Detailed performance analysis Audience: Engineers Update: Every 5s Panels: - Latency percentiles (p50, p95, p99) - Throughput graphs - Resource utilization - Slow query analysis
3. Error Dashboard¶
Purpose: Error tracking and debugging Audience: On-call engineers Update: Every 5s Panels: - Error rate trends - Error breakdown by type - Error logs table - Stack traces (recent)
4. Infrastructure Dashboard¶
Purpose: System resources monitoring Audience: DevOps Update: Every 30s Panels: - Host metrics (CPU, RAM, Disk) - Network I/O - Container/Pod status - Disk usage
✅ Quality Checklist¶
Before publishing a dashboard:
- Clear purpose: Title explains dashboard goal
- Max 9 panels: Not overcrowded
- Consistent units: All metrics properly labeled
- Color coding: Thresholds match severity
- Mobile tested: Works on small screens
- Time range: Appropriate default selected
- Refresh rate: Set to 5-30s (not 1s)
- No orphan queries: All queries have data
- Legend placement: Bottom or right (not top)
- Tooltips enabled: Hover shows details
- Variables used: For filtering (env, host, service)
- Documentation: Description field filled
🔧 Advanced Features¶
Variables¶
{
"templating": {
"list": [
{
"name": "environment",
"type": "custom",
"options": ["prod", "staging", "dev"],
"current": {"value": "prod"}
},
{
"name": "host",
"type": "query",
"query": "label_values(up, instance)",
"refresh": 1
}
]
}
}
Links¶
{
"links": [
{
"title": "Related Dashboard",
"url": "/d/performance-dashboard",
"type": "dashboards"
}
]
}
Annotations¶
{
"annotations": {
"list": [
{
"datasource": "Prometheus",
"enable": true,
"expr": "ALERTS{alertstate=\"firing\"}",
"name": "Active Alerts",
"tagKeys": "alertname"
}
]
}
}
📚 Resources¶
Last Updated: 2024-10-03 Version: 1.0 Maintainer: AudioLab Infrastructure Team