Hotspot Identification Module¶
Module: 08_09_02_hotspot_identification
Phase: FASE 3 - Hotspot Identification
Status: ✅ Complete
Overview¶
The Hotspot Identification module provides advanced performance profiling tools to identify performance bottlenecks, analyze call graphs, and guide optimization efforts through profile-guided optimization (PGO).
Features¶
🎯 Core Capabilities¶
- Profile-Guided Optimization (PGO) - Runtime performance profiling with statistical analysis
- Bottleneck Detection - Automatic classification of performance issues
- Call Graph Analysis - Function relationship and critical path detection
- Hardware Counter Integration - Cache misses, branch mispredictions (platform-dependent)
- Optimization Recommendations - Actionable hints for performance improvements
Architecture¶
08_09_02_hotspot_identification/
├── include/
│ ├── ProfileGuidedOptimizer.hpp # PGO profiler
│ ├── BottleneckDetector.hpp # Bottleneck classifier
│ └── CallGraphAnalyzer.hpp # Call graph builder
├── src/
│ ├── ProfileGuidedOptimizer.cpp
│ ├── BottleneckDetector.cpp
│ └── CallGraphAnalyzer.cpp
├── tests/
│ └── test_hotspot_identification.cpp
├── examples/
│ └── hotspot_demo.cpp
└── CMakeLists.txt
Quick Start¶
1. Profile-Guided Optimization¶
#include <ProfileGuidedOptimizer.hpp>
ProfileGuidedOptimizer pgo;
// Profile code regions
{
ScopedProfile profile(pgo, "processReverb");
processReverb(); // Your code
}
// Analyze results
pgo.analyze();
auto hotspots = pgo.getHotspots();
for (const auto& hs : hotspots) {
std::cout << hs.regionName << ": "
<< hs.percentageOfTotal << "% of total time\n";
}
// Get optimization hints
auto hints = pgo.getOptimizationHints();
2. Bottleneck Detection¶
#include <BottleneckDetector.hpp>
BottleneckDetector detector;
BottleneckDetector::ProfileData data;
data.regionName = "processAudio";
data.executionTime = 1000.0; // microseconds
data.cacheMisses = 5000;
data.instructions = 100000;
detector.analyzeRegion("processAudio", data);
auto bottlenecks = detector.getBottlenecks();
auto critical = detector.getCriticalBottlenecks(); // Severity >= 4
std::cout << detector.generateRecommendationReport();
3. Call Graph Analysis¶
#include <CallGraphAnalyzer.hpp>
CallGraphAnalyzer analyzer;
// Build call graph
analyzer.recordCall("main", "processAudio");
analyzer.recordCall("processAudio", "processReverb");
analyzer.recordCall("processReverb", "processAllpass");
// Record timings
analyzer.recordTiming("processReverb", 500.0, 450.0); // total, self
analyzer.analyze();
// Get critical path
auto criticalPath = analyzer.getCriticalPath();
// Export for visualization
std::string dot = analyzer.exportToDOT(); // GraphViz
std::string json = analyzer.exportToJSON();
Components¶
ProfileGuidedOptimizer¶
Features: - Runtime profiling with microsecond precision - Statistical analysis (min/max/avg/stddev) - Automatic hotspot detection - RAII scoped profiling - Optimization hint generation
Hotspot Detection:
PGOConfig config;
config.hotspotThresholdPercent = 5.0; // Regions taking >5% = hotspots
ProfileGuidedOptimizer pgo(config);
// ... profile code ...
pgo.analyze();
auto hotspots = pgo.getHotspots();
// Returns regions exceeding threshold
Performance Comparison:
PerformanceComparator comparator;
// Record baseline
pgo.analyze();
auto baselineProfile = pgo.getProfile("myFunction");
comparator.recordBaseline("myFunction", baselineProfile);
// Optimize code...
// Record optimized
pgo.reset();
// ... profile again ...
pgo.analyze();
auto optimizedProfile = pgo.getProfile("myFunction");
comparator.recordOptimized("myFunction", optimizedProfile);
// Compare
auto comparison = comparator.compare("myFunction");
std::cout << "Speedup: " << comparison.improvement << "x\n";
BottleneckDetector¶
Bottleneck Types:
- CPU - CPU-bound computation (high CPI)
- Memory - Memory bandwidth saturation
- Cache - High cache miss rate
- Branch - Branch mispredictions
- Synchronization - Lock contention
- IO - I/O bound
Classification:
auto type = detector.classifyBottleneck(profileData);
switch (type) {
case BottleneckType::CPU:
// Consider SIMD vectorization
break;
case BottleneckType::Cache:
// Improve cache locality
break;
// ...
}
Hardware Counters (optional):
HardwareCounters counters;
counters.enableCacheMissCounters();
counters.enableBranchMissCounters();
counters.start();
// ... code to measure ...
counters.stop();
std::cout << "Cache misses: " << counters.getCacheMisses() << "\n";
std::cout << "CPI: " << counters.getCPI() << "\n";
CallGraphAnalyzer¶
Features: - Caller/callee relationship tracking - Recursive function detection - Critical path computation - Self vs total time breakdown - DOT/JSON export for visualization
Call Tree:
auto roots = analyzer.getRootFunctions(); // Entry points
auto leaves = analyzer.getLeafFunctions(); // Terminal functions
auto topByTotal = analyzer.getTopFunctionsByTotal(5);
auto topBySelf = analyzer.getTopFunctionsBySelf(5);
Recursion Detection:
analyzer.analyze();
auto recursive = analyzer.getRecursiveFunctions();
for (const auto& func : recursive) {
std::cout << func.name << " is recursive (depth: "
<< func.recursionDepth << ")\n";
}
Testing¶
cmake -B build -DBUILD_TESTING=ON
cmake --build build --config Release
cd build
ctest --output-on-failure
Examples¶
Run the interactive demo:
Demonstrates: 1. PGO profiling with hotspot detection 2. Bottleneck classification 3. Call graph building and export
Integration with Tracy¶
The ProfileGuidedOptimizer supports integration with Tracy Profiler:
pgo.enableTracyIntegration();
// Profiles will also be sent to Tracy
{
ScopedProfile p(pgo, "region");
// ...
}
Best Practices¶
1. Profile in Release Builds¶
Debug builds have overhead that skews results.
2. Sufficient Sample Size¶
3. Focus on Hotspots¶
4. Validate Optimizations¶
PerformanceComparator comp;
// Baseline
comp.recordBaseline("func", baselineProfile);
// After optimization
comp.recordOptimized("func", optimizedProfile);
// Verify improvement
auto cmp = comp.compare("func");
REQUIRE(cmp.improvement > 1.1); // At least 10% faster
Optimization Hints¶
The PGO generates automatic hints:
| Hint Type | Trigger | Recommendation |
|---|---|---|
SIMDVectorization |
>10% total time | Use SSE/AVX/NEON |
LoopUnrolling |
>10k invocations | Unroll or inline |
CacheOptimization |
High variance | Fix cache misses |
BranchPrediction |
High branch misses | Reduce branches |
Dependencies¶
- C++20 -
<chrono>,<functional> - Catch2 - Testing
- Optional: Tracy Profiler
See Also¶
- 08_09_00_cpu_estimation - Static CPU cost
- 08_09_01_memory_prediction - Memory analysis
- 08_09_IMPLEMENTATION_PLAN.md - Full roadmap
Module Status: ✅ FASE 3 Complete Next: FASE 4 - Bottleneck Warnings (08_09_03_bottleneck_warnings)