Skip to content

⚑ Profiler Selection Guide

🎯 Herramientas por Plataforma

╔═══════════════════════════════════════════════════════════════════════╗
β•‘ Profiler      β”‚ Platform  β”‚ Fortaleza          β”‚ CuΓ‘ndo usar        β•‘
╠═══════════════β•ͺ═══════════β•ͺ════════════════════β•ͺ════════════════════╣
β•‘ VTune         β”‚ Windows   β”‚ Intel CPU deep     β”‚ Hotspot analysis   β•‘
β•‘ Instruments   β”‚ macOS     β”‚ OS integration     β”‚ Mac development    β•‘
β•‘ perf          β”‚ Linux     β”‚ Lightweight        β”‚ Production profil  β•‘
β•‘ Superluminal  β”‚ Win/Linux β”‚ Real-time          β”‚ Live game/audio    β•‘
β•‘ Tracy         β”‚ All       β”‚ Frame profiler     β”‚ Real-time viz      β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

πŸ”¬ Tipos de AnΓ‘lisis

Hotspot Analysis

  • QuΓ©: Funciones que mΓ‘s CPU usan
  • Tool: VTune, Instruments Time Profiler
  • CuΓ‘ndo: First optimization pass
  • Output: Function call tree con % CPU time

Memory Analysis

  • QuΓ©: Allocations, leaks, cache misses
  • Tool: Valgrind, AddressSanitizer, Instruments Allocations
  • CuΓ‘ndo: Memory issues suspected
  • Output: Allocation timeline, leak report

Real-time Analysis

  • QuΓ©: Timeline de ejecuciΓ³n
  • Tool: Tracy, Superluminal
  • CuΓ‘ndo: Latency spikes, frame drops
  • Output: Visual timeline con eventos

Cache Analysis

  • QuΓ©: L1/L2/L3 cache misses, memory bandwidth
  • Tool: VTune Memory Access, perf stat
  • CuΓ‘ndo: Performance mysteriously low
  • Output: Cache hit/miss ratios

Microarchitecture Analysis

  • QuΓ©: Branch prediction, pipeline stalls, IPC
  • Tool: VTune Microarchitecture Exploration
  • CuΓ‘ndo: CΓ³digo CPU-bound optimizado pero lento
  • Output: PMU counters, bottleneck identification

πŸ“‹ Profiling Workflow

1. Reproduce issue/scenario
   ↓
2. Run profiler (appropriate type)
   ↓
3. Analyze results (identify hotspots)
   ↓
4. Hypothesize bottleneck (root cause)
   ↓
5. Implement fix
   ↓
6. Re-profile to validate (compare before/after)

πŸ› οΈ Tool Details

Intel VTune Profiler (Windows/Linux)

Pros: - Deep Intel CPU insights - Hardware event sampling (PMU) - Call stack analysis - Source code attribution - Threading analysis

Cons: - Intel CPUs only (limited AMD support) - Commercial license for full features - Heavy installation

Analysis Types: - Hotspots (CPU usage) - Memory Access (cache analysis) - Threading (locks, waits) - Microarchitecture Exploration - I/O

Usage:

# Command line
vtune -collect hotspots -result-dir ./vtune_results -- ./AudioPlugin.exe

# GUI
vtune-gui

Xcode Instruments (macOS)

Pros: - Deep macOS/iOS integration - Metal GPU profiling - Beautiful UI - Energy profiling - Network profiling

Cons: - macOS only - Requires Xcode - Can be slow with large traces

Instruments: - Time Profiler (CPU hotspots) - Allocations (memory allocations) - Leaks (memory leaks) - System Trace (kernel events) - Metal (GPU) - Audio (Core Audio events)

Usage:

# Command line
xctrace record --template 'Time Profiler' --launch ./AudioPlugin.app

# GUI
instruments

Linux perf (Linux)

Pros: - Built into kernel - Extremely lightweight - Production-safe - Rich event support

Cons: - Command-line focused - Visualization requires external tools - Requires debug symbols for useful output

Event Types: - Hardware: cycles, instructions, cache-misses - Software: context-switches, page-faults - Tracepoints: system calls, scheduler events

Usage:

# Record
perf record -g ./AudioPlugin

# Report
perf report

# Stat
perf stat ./AudioPlugin

Superluminal (Windows/Linux)

Pros: - Real-time profiling - Low overhead - Beautiful UI - Easy integration

Cons: - Commercial ($) - Sampling-based (may miss short events)

Usage:

// Instrument code
#include <Superluminal/PerformanceAPI.h>

void processAudio() {
    PERFORMANCEAPI_INSTRUMENT_FUNCTION();
    // ... audio processing
}

Tracy Profiler (All platforms)

Pros: - Open source - Frame profiler design (great for real-time) - Low overhead - Network-based (profile remote devices) - Visual timeline

Cons: - Requires code instrumentation - Setup can be complex

Usage:

// Instrument code
#include <tracy/Tracy.hpp>

void processAudio() {
    ZoneScoped;
    // ... audio processing
}

🎯 Use Case β†’ Tool Mapping

"My audio callback is taking too long"

Tool: Tracy or Superluminal Why: Real-time timeline shows exact where time is spent

"My plugin uses too much CPU but I don't know where"

Tool: VTune Hotspots or Instruments Time Profiler Why: Statistical sampling finds hot functions

"Performance regressed but I don't know why"

Tool: VTune Microarchitecture or perf stat Why: Hardware counters reveal cache misses, branch mispredictions

"Memory usage is growing"

Tool: Instruments Allocations or Valgrind Massif Why: Allocation tracking finds leaks

"Occasional glitches/dropouts"

Tool: Tracy with manual zones around critical sections Why: Timeline shows when and where spikes occur

"Multi-threaded performance is poor"

Tool: VTune Threading or Instruments System Trace Why: Shows lock contention, thread synchronization

πŸ“Š Profiling Best Practices

Before Profiling

  • Build in Release mode with debug symbols (-O2 -g)
  • Disable optimizations only if needed (-O0 can mislead)
  • Use representative workload (real audio files, realistic settings)
  • Run multiple times (account for variance)
  • Close other applications (reduce noise)

During Profiling

  • Profile entire scenario (not just one function)
  • Use sufficient sample rate (1000 Hz typical)
  • Collect call stacks (essential for root cause)
  • Record enough data (30s minimum for statistical significance)
  • Note system state (CPU load, memory usage)

After Profiling

  • Focus on hotspots (80/20 rule: 20% of code = 80% of time)
  • Verify with multiple profilers (sanity check)
  • Profile before and after optimization (measure improvement)
  • Document findings (what, where, why)
  • Share results with team

πŸš€ Quick Start Commands

VTune (Windows/Linux)

# Hotspot analysis
vtune -collect hotspots -knob sampling-interval=1 -result-dir ./vtune_results -- ./AudioPlugin.exe

# Memory access
vtune -collect memory-access -result-dir ./vtune_mem -- ./AudioPlugin.exe

# Generate report
vtune -report summary -result-dir ./vtune_results -format html -report-output ./report.html

Instruments (macOS)

# Time profiler
xctrace record --template 'Time Profiler' --output ./trace.trace --launch ./AudioPlugin.app

# Allocations
xctrace record --template 'Allocations' --output ./alloc.trace --launch ./AudioPlugin.app

# Import to Instruments.app
open ./trace.trace

perf (Linux)

# Record with call graph
perf record -F 1000 -g --call-graph dwarf ./AudioPlugin

# Report
perf report --stdio

# Annotate source (requires debug symbols)
perf annotate

# Flamegraph (requires flamegraph.pl)
perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg

Tracy

# Server (GUI)
Tracy

# Instrumented app connects automatically
./AudioPlugin

Valgrind (Linux/macOS)

# Callgrind (call graph profiler)
valgrind --tool=callgrind --callgrind-out-file=callgrind.out ./AudioPlugin

# Visualize with kcachegrind
kcachegrind callgrind.out

# Massif (heap profiler)
valgrind --tool=massif --massif-out-file=massif.out ./AudioPlugin
ms_print massif.out

πŸ”§ Integration with Build System

CMake Integration

# Add profiling support
option(ENABLE_PROFILING "Enable profiling support" OFF)

if(ENABLE_PROFILING)
    # Debug symbols for profiling
    set(CMAKE_BUILD_TYPE RelWithDebInfo)

    # Tracy integration
    if(USE_TRACY)
        target_compile_definitions(${PROJECT_NAME} PRIVATE TRACY_ENABLE)
        target_link_libraries(${PROJECT_NAME} PRIVATE Tracy::TracyClient)
    endif()

    # Superluminal integration
    if(USE_SUPERLUMINAL)
        target_include_directories(${PROJECT_NAME} PRIVATE ${SUPERLUMINAL_INCLUDE_DIR})
        target_link_libraries(${PROJECT_NAME} PRIVATE ${SUPERLUMINAL_LIB})
    endif()
endif()

πŸ“š Learning Resources

VTune

Instruments

  • WWDC Sessions: Search "Instruments" on developer.apple.com
  • Instruments Help: Built into Xcode

perf

Tracy

Minimal (Free)

  • Windows: perf (WSL) + Visual Studio Profiler
  • macOS: Instruments (free with Xcode)
  • Linux: perf + flamegraph

Professional (Audio Development)

  • Windows: VTune + Superluminal
  • macOS: Instruments + Superluminal
  • Linux: perf + Tracy
  • All: Tracy for cross-platform real-time profiling

Enterprise (Team)

  • All of the above
  • Continuous profiling in CI/CD
  • Automated regression detection
  • Shared profiling results repository

Last Updated: [Date] Owner: Performance Team