Skip to content

📊 Dashboard Design Guidelines

🎯 Core Principles

Golden Rules

╔════════════════════════════════════════════════════════════╗
║ DO                          │ DON'T                        ║
╠═════════════════════════════╪══════════════════════════════╣
║ Max 9 panels per dashboard  │ Overload con 20+ panels      ║
║ Consistent color scheme     │ Rainbow vomit                ║
║ Meaningful titles           │ "Graph 1", "Panel 2"         ║
║ Show units (ms, %, MB)      │ Unitless numbers             ║
║ Mobile-responsive           │ Assume desktop only          ║
║ Focus on actionable data    │ Vanity metrics               ║
║ Update every 5-10s          │ Real-time (every 1s)         ║
╚════════════════════════════════════════════════════════════╝

🎨 Color Scheme

Traffic Light Pattern

Green (Good):
  - CPU < 50%
  - Memory < 60%
  - Error rate < 0.1%
  - Latency < 10ms
  - Status: Operational

Yellow (Warning):
  - CPU 50-80%
  - Memory 60-80%
  - Error rate 0.1-1%
  - Latency 10-50ms
  - Status: Degraded

Red (Critical):
  - CPU > 80%
  - Memory > 80%
  - Error rate > 1%
  - Latency > 50ms
  - Status: Outage

Standard Colors

Primary:    #3274D9 (Blue)    - Graphs, lines
Success:    #73BF69 (Green)   - Good metrics
Warning:    #FADE2A (Yellow)  - Warning thresholds
Critical:   #F2495C (Red)     - Critical alerts
Info:       #5794F2 (Light Blue) - Info panels
Background: #111217 (Dark)    - Panel background

📐 Layout Guidelines

Standard Dashboard Layout

┌─────────────────────────────────────────────────────────┐
│  DASHBOARD TITLE                    [Time Range] [⟳]    │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  [Key Metric 1]  [Key Metric 2]  [Key Metric 3]        │
│   (Stat Panel)    (Stat Panel)    (Stat Panel)         │
│       50%             128MB           0.01%             │
│                                                          │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  [Trend Graph - Full Width]                            │
│   ┌─────────────────────────────────────────────┐     │
│   │                                              │     │
│   │         ╱╲    ╱╲                            │     │
│   │    ╱╲  ╱  ╲  ╱  ╲   ╱╲                      │     │
│   │   ╱  ╲╱    ╲╱    ╲ ╱  ╲                     │     │
│   │  ╱                ╲╱    ╲                   │     │
│   └─────────────────────────────────────────────┘     │
│                                                          │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  [Detail Panel 1]  [Detail Panel 2]  [Detail Panel 3]  │
│   (Table/Graph)     (Heatmap)        (Logs)           │
│                                                          │
└─────────────────────────────────────────────────────────┘

Grid System

  • 12-column grid: Divide width into 12 units
  • Panel heights: 4, 8, 12, or 16 units
  • Standard sizes:
  • Full width: 12 units
  • Half width: 6 units
  • Third width: 4 units
  • Quarter width: 3 units

Panel Positioning

{
  "gridPos": {
    "x": 0,     // Horizontal position (0-12)
    "y": 0,     // Vertical position (starts at 0)
    "w": 6,     // Width (1-12)
    "h": 8      // Height (typically 4, 8, 12, 16)
  }
}

⏰ Time Ranges

Standard Presets

Real-time Monitoring:
  - Last 5 minutes  (for live issues)
  - Last 15 minutes (active incident)
  - Last 30 minutes (incident trend)

Investigation:
  - Last 1 hour  (recent behavior)
  - Last 3 hours (extended pattern)
  - Last 6 hours (shift analysis)

Trend Analysis:
  - Last 24 hours (daily pattern)
  - Last 7 days   (weekly trend)
  - Last 30 days  (monthly overview)

Custom:
  - From/To selector
  - Relative: now-6h to now

Refresh Intervals

Production (Live):
  - 5s  (active incident)
  - 10s (normal monitoring)
  - 30s (overview dashboard)

Development:
  - 30s (local testing)
  - 1m  (integration testing)

Historical:
  - None (static analysis)

📊 Panel Types & Usage

1. Stat Panels (Single Values)

Use for: - Current value of key metrics - Percentage indicators - Binary status (Up/Down)

Configuration:

{
  "type": "stat",
  "options": {
    "textMode": "value_and_name",
    "graphMode": "area",
    "colorMode": "background",
    "orientation": "auto",
    "reduceOptions": {
      "values": false,
      "calcs": ["lastNotNull"]
    }
  },
  "fieldConfig": {
    "defaults": {
      "thresholds": {
        "mode": "absolute",
        "steps": [
          {"color": "green", "value": null},
          {"color": "yellow", "value": 50},
          {"color": "red", "value": 80}
        ]
      },
      "unit": "percent"
    }
  }
}

2. Time Series Graphs

Use for: - Trends over time - Multiple metrics comparison - Identifying patterns

Best Practices: - Max 5 series per graph - Use legend at bottom - Enable tooltips - Add threshold lines

3. Tables

Use for: - Top N queries (slowest, errors) - Detailed logs - Current state of resources

Configuration: - Sort by most critical column - Limit to 10-20 rows - Color-code critical values

4. Heatmaps

Use for: - Latency distribution - Load patterns over time - Bucket visualization

5. Gauges

Use for: - Percentage metrics with clear thresholds - Capacity indicators - Progress tracking

🏷️ Naming Conventions

Dashboard Names

Format: [System] - [Purpose]
Examples:
  - AudioLab - System Overview
  - AudioLab - Performance Monitoring
  - AudioLab - Error Analysis
  - AudioLab - Infrastructure Health

Panel Titles

Format: [Metric] ([Unit])
Examples:
  - CPU Usage (%)
  - Memory Allocated (MB)
  - Request Latency (ms)
  - Error Rate (errors/sec)
  - Active Connections (count)

Query Labels

Format: {service="name", environment="prod", instance="host1"}
Examples:
  - {service="audio-engine", env="prod"}
  - {component="reverb", type="plugin"}

📱 Mobile Responsiveness

Design Considerations

  • Vertical stacking on mobile (width < 768px)
  • Hide less critical panels on small screens
  • Larger touch targets for interactive elements
  • Simplified legends on mobile

Panel Visibility

{
  "hideWhenNoData": true,
  "transparent": false,
  "options": {
    "legend": {
      "displayMode": "list",
      "placement": "bottom",
      "showLegend": true
    }
  }
}

🎯 Dashboard Categories

1. Overview Dashboard

Purpose: High-level system health Audience: Everyone Update: Every 10s Panels: - System status (UP/DOWN) - Key metrics (CPU, Memory, Errors) - Active alerts count - Recent events timeline

2. Performance Dashboard

Purpose: Detailed performance analysis Audience: Engineers Update: Every 5s Panels: - Latency percentiles (p50, p95, p99) - Throughput graphs - Resource utilization - Slow query analysis

3. Error Dashboard

Purpose: Error tracking and debugging Audience: On-call engineers Update: Every 5s Panels: - Error rate trends - Error breakdown by type - Error logs table - Stack traces (recent)

4. Infrastructure Dashboard

Purpose: System resources monitoring Audience: DevOps Update: Every 30s Panels: - Host metrics (CPU, RAM, Disk) - Network I/O - Container/Pod status - Disk usage

✅ Quality Checklist

Before publishing a dashboard:

  • Clear purpose: Title explains dashboard goal
  • Max 9 panels: Not overcrowded
  • Consistent units: All metrics properly labeled
  • Color coding: Thresholds match severity
  • Mobile tested: Works on small screens
  • Time range: Appropriate default selected
  • Refresh rate: Set to 5-30s (not 1s)
  • No orphan queries: All queries have data
  • Legend placement: Bottom or right (not top)
  • Tooltips enabled: Hover shows details
  • Variables used: For filtering (env, host, service)
  • Documentation: Description field filled

🔧 Advanced Features

Variables

{
  "templating": {
    "list": [
      {
        "name": "environment",
        "type": "custom",
        "options": ["prod", "staging", "dev"],
        "current": {"value": "prod"}
      },
      {
        "name": "host",
        "type": "query",
        "query": "label_values(up, instance)",
        "refresh": 1
      }
    ]
  }
}
{
  "links": [
    {
      "title": "Related Dashboard",
      "url": "/d/performance-dashboard",
      "type": "dashboards"
    }
  ]
}

Annotations

{
  "annotations": {
    "list": [
      {
        "datasource": "Prometheus",
        "enable": true,
        "expr": "ALERTS{alertstate=\"firing\"}",
        "name": "Active Alerts",
        "tagKeys": "alertname"
      }
    ]
  }
}

📚 Resources


Last Updated: 2024-10-03 Version: 1.0 Maintainer: AudioLab Infrastructure Team