β‘ Profiler Selection Guide¶
π― Herramientas por Plataforma¶
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Profiler β Platform β Fortaleza β CuΓ‘ndo usar β
β ββββββββββββββββͺββββββββββββͺβββββββββββββββββββββͺβββββββββββββββββββββ£
β VTune β Windows β Intel CPU deep β Hotspot analysis β
β Instruments β macOS β OS integration β Mac development β
β perf β Linux β Lightweight β Production profil β
β Superluminal β Win/Linux β Real-time β Live game/audio β
β Tracy β All β Frame profiler β Real-time viz β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π¬ Tipos de AnΓ‘lisis¶
Hotspot Analysis¶
- QuΓ©: Funciones que mΓ‘s CPU usan
- Tool: VTune, Instruments Time Profiler
- CuΓ‘ndo: First optimization pass
- Output: Function call tree con % CPU time
Memory Analysis¶
- QuΓ©: Allocations, leaks, cache misses
- Tool: Valgrind, AddressSanitizer, Instruments Allocations
- CuΓ‘ndo: Memory issues suspected
- Output: Allocation timeline, leak report
Real-time Analysis¶
- QuΓ©: Timeline de ejecuciΓ³n
- Tool: Tracy, Superluminal
- CuΓ‘ndo: Latency spikes, frame drops
- Output: Visual timeline con eventos
Cache Analysis¶
- QuΓ©: L1/L2/L3 cache misses, memory bandwidth
- Tool: VTune Memory Access, perf stat
- CuΓ‘ndo: Performance mysteriously low
- Output: Cache hit/miss ratios
Microarchitecture Analysis¶
- QuΓ©: Branch prediction, pipeline stalls, IPC
- Tool: VTune Microarchitecture Exploration
- CuΓ‘ndo: CΓ³digo CPU-bound optimizado pero lento
- Output: PMU counters, bottleneck identification
π Profiling Workflow¶
1. Reproduce issue/scenario
β
2. Run profiler (appropriate type)
β
3. Analyze results (identify hotspots)
β
4. Hypothesize bottleneck (root cause)
β
5. Implement fix
β
6. Re-profile to validate (compare before/after)
π οΈ Tool Details¶
Intel VTune Profiler (Windows/Linux)¶
Pros: - Deep Intel CPU insights - Hardware event sampling (PMU) - Call stack analysis - Source code attribution - Threading analysis
Cons: - Intel CPUs only (limited AMD support) - Commercial license for full features - Heavy installation
Analysis Types: - Hotspots (CPU usage) - Memory Access (cache analysis) - Threading (locks, waits) - Microarchitecture Exploration - I/O
Usage:
# Command line
vtune -collect hotspots -result-dir ./vtune_results -- ./AudioPlugin.exe
# GUI
vtune-gui
Xcode Instruments (macOS)¶
Pros: - Deep macOS/iOS integration - Metal GPU profiling - Beautiful UI - Energy profiling - Network profiling
Cons: - macOS only - Requires Xcode - Can be slow with large traces
Instruments: - Time Profiler (CPU hotspots) - Allocations (memory allocations) - Leaks (memory leaks) - System Trace (kernel events) - Metal (GPU) - Audio (Core Audio events)
Usage:
# Command line
xctrace record --template 'Time Profiler' --launch ./AudioPlugin.app
# GUI
instruments
Linux perf (Linux)¶
Pros: - Built into kernel - Extremely lightweight - Production-safe - Rich event support
Cons: - Command-line focused - Visualization requires external tools - Requires debug symbols for useful output
Event Types: - Hardware: cycles, instructions, cache-misses - Software: context-switches, page-faults - Tracepoints: system calls, scheduler events
Usage:
Superluminal (Windows/Linux)¶
Pros: - Real-time profiling - Low overhead - Beautiful UI - Easy integration
Cons: - Commercial ($) - Sampling-based (may miss short events)
Usage:
// Instrument code
#include <Superluminal/PerformanceAPI.h>
void processAudio() {
PERFORMANCEAPI_INSTRUMENT_FUNCTION();
// ... audio processing
}
Tracy Profiler (All platforms)¶
Pros: - Open source - Frame profiler design (great for real-time) - Low overhead - Network-based (profile remote devices) - Visual timeline
Cons: - Requires code instrumentation - Setup can be complex
Usage:
// Instrument code
#include <tracy/Tracy.hpp>
void processAudio() {
ZoneScoped;
// ... audio processing
}
π― Use Case β Tool Mapping¶
"My audio callback is taking too long"¶
Tool: Tracy or Superluminal Why: Real-time timeline shows exact where time is spent
"My plugin uses too much CPU but I don't know where"¶
Tool: VTune Hotspots or Instruments Time Profiler Why: Statistical sampling finds hot functions
"Performance regressed but I don't know why"¶
Tool: VTune Microarchitecture or perf stat Why: Hardware counters reveal cache misses, branch mispredictions
"Memory usage is growing"¶
Tool: Instruments Allocations or Valgrind Massif Why: Allocation tracking finds leaks
"Occasional glitches/dropouts"¶
Tool: Tracy with manual zones around critical sections Why: Timeline shows when and where spikes occur
"Multi-threaded performance is poor"¶
Tool: VTune Threading or Instruments System Trace Why: Shows lock contention, thread synchronization
π Profiling Best Practices¶
Before Profiling¶
- Build in Release mode with debug symbols (
-O2 -g) - Disable optimizations only if needed (
-O0can mislead) - Use representative workload (real audio files, realistic settings)
- Run multiple times (account for variance)
- Close other applications (reduce noise)
During Profiling¶
- Profile entire scenario (not just one function)
- Use sufficient sample rate (1000 Hz typical)
- Collect call stacks (essential for root cause)
- Record enough data (30s minimum for statistical significance)
- Note system state (CPU load, memory usage)
After Profiling¶
- Focus on hotspots (80/20 rule: 20% of code = 80% of time)
- Verify with multiple profilers (sanity check)
- Profile before and after optimization (measure improvement)
- Document findings (what, where, why)
- Share results with team
π Quick Start Commands¶
VTune (Windows/Linux)¶
# Hotspot analysis
vtune -collect hotspots -knob sampling-interval=1 -result-dir ./vtune_results -- ./AudioPlugin.exe
# Memory access
vtune -collect memory-access -result-dir ./vtune_mem -- ./AudioPlugin.exe
# Generate report
vtune -report summary -result-dir ./vtune_results -format html -report-output ./report.html
Instruments (macOS)¶
# Time profiler
xctrace record --template 'Time Profiler' --output ./trace.trace --launch ./AudioPlugin.app
# Allocations
xctrace record --template 'Allocations' --output ./alloc.trace --launch ./AudioPlugin.app
# Import to Instruments.app
open ./trace.trace
perf (Linux)¶
# Record with call graph
perf record -F 1000 -g --call-graph dwarf ./AudioPlugin
# Report
perf report --stdio
# Annotate source (requires debug symbols)
perf annotate
# Flamegraph (requires flamegraph.pl)
perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg
Tracy¶
Valgrind (Linux/macOS)¶
# Callgrind (call graph profiler)
valgrind --tool=callgrind --callgrind-out-file=callgrind.out ./AudioPlugin
# Visualize with kcachegrind
kcachegrind callgrind.out
# Massif (heap profiler)
valgrind --tool=massif --massif-out-file=massif.out ./AudioPlugin
ms_print massif.out
π§ Integration with Build System¶
CMake Integration¶
# Add profiling support
option(ENABLE_PROFILING "Enable profiling support" OFF)
if(ENABLE_PROFILING)
# Debug symbols for profiling
set(CMAKE_BUILD_TYPE RelWithDebInfo)
# Tracy integration
if(USE_TRACY)
target_compile_definitions(${PROJECT_NAME} PRIVATE TRACY_ENABLE)
target_link_libraries(${PROJECT_NAME} PRIVATE Tracy::TracyClient)
endif()
# Superluminal integration
if(USE_SUPERLUMINAL)
target_include_directories(${PROJECT_NAME} PRIVATE ${SUPERLUMINAL_INCLUDE_DIR})
target_link_libraries(${PROJECT_NAME} PRIVATE ${SUPERLUMINAL_LIB})
endif()
endif()
π Learning Resources¶
VTune¶
- Intel VTune Cookbook: https://software.intel.com/content/www/us/en/develop/documentation/vtune-cookbook/
- Video tutorials: https://software.intel.com/content/www/us/en/develop/videos/vtune-profiler.html
Instruments¶
- WWDC Sessions: Search "Instruments" on developer.apple.com
- Instruments Help: Built into Xcode
perf¶
- Brendan Gregg's perf page: https://www.brendangregg.com/perf.html
- Linux perf wiki: https://perf.wiki.kernel.org/
Tracy¶
- Manual: https://github.com/wolfpld/tracy/releases (tracy.pdf)
- Examples: https://github.com/wolfpld/tracy/tree/master/examples
π― Recommended Setup¶
Minimal (Free)¶
- Windows: perf (WSL) + Visual Studio Profiler
- macOS: Instruments (free with Xcode)
- Linux: perf + flamegraph
Professional (Audio Development)¶
- Windows: VTune + Superluminal
- macOS: Instruments + Superluminal
- Linux: perf + Tracy
- All: Tracy for cross-platform real-time profiling
Enterprise (Team)¶
- All of the above
- Continuous profiling in CI/CD
- Automated regression detection
- Shared profiling results repository
Last Updated: [Date] Owner: Performance Team