Hotspot Identification Module¶

Module: 08_09_02_hotspot_identification Phase: FASE 3 - Hotspot Identification Status: ✅ Complete

Overview¶

The Hotspot Identification module provides advanced performance profiling tools to identify performance bottlenecks, analyze call graphs, and guide optimization efforts through profile-guided optimization (PGO).

Features¶

🎯 Core Capabilities¶

Profile-Guided Optimization (PGO) - Runtime performance profiling with statistical analysis
Bottleneck Detection - Automatic classification of performance issues
Call Graph Analysis - Function relationship and critical path detection
Hardware Counter Integration - Cache misses, branch mispredictions (platform-dependent)
Optimization Recommendations - Actionable hints for performance improvements

Architecture¶

08_09_02_hotspot_identification/
├── include/
│   ├── ProfileGuidedOptimizer.hpp    # PGO profiler
│   ├── BottleneckDetector.hpp        # Bottleneck classifier
│   └── CallGraphAnalyzer.hpp         # Call graph builder
├── src/
│   ├── ProfileGuidedOptimizer.cpp
│   ├── BottleneckDetector.cpp
│   └── CallGraphAnalyzer.cpp
├── tests/
│   └── test_hotspot_identification.cpp
├── examples/
│   └── hotspot_demo.cpp
└── CMakeLists.txt

Quick Start¶

1. Profile-Guided Optimization¶

#include <ProfileGuidedOptimizer.hpp>

ProfileGuidedOptimizer pgo;

// Profile code regions
{
    ScopedProfile profile(pgo, "processReverb");
    processReverb();  // Your code
}

// Analyze results
pgo.analyze();

auto hotspots = pgo.getHotspots();
for (const auto& hs : hotspots) {
    std::cout << hs.regionName << ": "
              << hs.percentageOfTotal << "% of total time\n";
}

// Get optimization hints
auto hints = pgo.getOptimizationHints();

2. Bottleneck Detection¶

#include <BottleneckDetector.hpp>

BottleneckDetector detector;

BottleneckDetector::ProfileData data;
data.regionName = "processAudio";
data.executionTime = 1000.0;  // microseconds
data.cacheMisses = 5000;
data.instructions = 100000;

detector.analyzeRegion("processAudio", data);

auto bottlenecks = detector.getBottlenecks();
auto critical = detector.getCriticalBottlenecks();  // Severity >= 4

std::cout << detector.generateRecommendationReport();

3. Call Graph Analysis¶

#include <CallGraphAnalyzer.hpp>

CallGraphAnalyzer analyzer;

// Build call graph
analyzer.recordCall("main", "processAudio");
analyzer.recordCall("processAudio", "processReverb");
analyzer.recordCall("processReverb", "processAllpass");

// Record timings
analyzer.recordTiming("processReverb", 500.0, 450.0);  // total, self

analyzer.analyze();

// Get critical path
auto criticalPath = analyzer.getCriticalPath();

// Export for visualization
std::string dot = analyzer.exportToDOT();  // GraphViz
std::string json = analyzer.exportToJSON();

Components¶

ProfileGuidedOptimizer¶

Features: - Runtime profiling with microsecond precision - Statistical analysis (min/max/avg/stddev) - Automatic hotspot detection - RAII scoped profiling - Optimization hint generation

Hotspot Detection:

PGOConfig config;
config.hotspotThresholdPercent = 5.0;  // Regions taking >5% = hotspots

ProfileGuidedOptimizer pgo(config);

// ... profile code ...

pgo.analyze();

auto hotspots = pgo.getHotspots();
// Returns regions exceeding threshold

Performance Comparison:

PerformanceComparator comparator;

// Record baseline
pgo.analyze();
auto baselineProfile = pgo.getProfile("myFunction");
comparator.recordBaseline("myFunction", baselineProfile);

// Optimize code...

// Record optimized
pgo.reset();
// ... profile again ...
pgo.analyze();
auto optimizedProfile = pgo.getProfile("myFunction");
comparator.recordOptimized("myFunction", optimizedProfile);

// Compare
auto comparison = comparator.compare("myFunction");
std::cout << "Speedup: " << comparison.improvement << "x\n";

BottleneckDetector¶

Bottleneck Types: - CPU - CPU-bound computation (high CPI) - Memory - Memory bandwidth saturation - Cache - High cache miss rate - Branch - Branch mispredictions - Synchronization - Lock contention - IO - I/O bound

Classification:

auto type = detector.classifyBottleneck(profileData);

switch (type) {
    case BottleneckType::CPU:
        // Consider SIMD vectorization
        break;
    case BottleneckType::Cache:
        // Improve cache locality
        break;
    // ...
}

Hardware Counters (optional):

HardwareCounters counters;
counters.enableCacheMissCounters();
counters.enableBranchMissCounters();

counters.start();
// ... code to measure ...
counters.stop();

std::cout << "Cache misses: " << counters.getCacheMisses() << "\n";
std::cout << "CPI: " << counters.getCPI() << "\n";

CallGraphAnalyzer¶

Features: - Caller/callee relationship tracking - Recursive function detection - Critical path computation - Self vs total time breakdown - DOT/JSON export for visualization

Call Tree:

auto roots = analyzer.getRootFunctions();   // Entry points
auto leaves = analyzer.getLeafFunctions();  // Terminal functions

auto topByTotal = analyzer.getTopFunctionsByTotal(5);
auto topBySelf = analyzer.getTopFunctionsBySelf(5);

Recursion Detection:

analyzer.analyze();

auto recursive = analyzer.getRecursiveFunctions();
for (const auto& func : recursive) {
    std::cout << func.name << " is recursive (depth: "
              << func.recursionDepth << ")\n";
}

Testing¶

cmake -B build -DBUILD_TESTING=ON
cmake --build build --config Release
cd build
ctest --output-on-failure

Examples¶

Run the interactive demo:

./build/Release/hotspot_demo.exe

Demonstrates: 1. PGO profiling with hotspot detection 2. Bottleneck classification 3. Call graph building and export

Integration with Tracy¶

The ProfileGuidedOptimizer supports integration with Tracy Profiler:

pgo.enableTracyIntegration();

// Profiles will also be sent to Tracy
{
    ScopedProfile p(pgo, "region");
    // ...
}

Best Practices¶

1. Profile in Release Builds¶

cmake -DCMAKE_BUILD_TYPE=Release

Debug builds have overhead that skews results.

2. Sufficient Sample Size¶

PGOConfig config;
config.minInvocations = 100;  // Require 100+ calls for statistics

3. Focus on Hotspots¶

auto hotspots = pgo.getHotspots();
// Optimize top 20% that accounts for 80% of time

4. Validate Optimizations¶

PerformanceComparator comp;

// Baseline
comp.recordBaseline("func", baselineProfile);

// After optimization
comp.recordOptimized("func", optimizedProfile);

// Verify improvement
auto cmp = comp.compare("func");
REQUIRE(cmp.improvement > 1.1);  // At least 10% faster

Optimization Hints¶

The PGO generates automatic hints:

Hint Type	Trigger	Recommendation
`SIMDVectorization`	>10% total time	Use SSE/AVX/NEON
`LoopUnrolling`	>10k invocations	Unroll or inline
`CacheOptimization`	High variance	Fix cache misses
`BranchPrediction`	High branch misses	Reduce branches

Dependencies¶

C++20 - <chrono>, <functional>
Catch2 - Testing
Optional: Tracy Profiler