📏 Measurement-First Philosophy¶

🎯 Mantra¶

╔════════════════════════════════════════════════════════╗ ║ ║ ║ 1. MEASURE ║ ║ 2. OPTIMIZE ║ ║ 3. MEASURE AGAIN ║ ║ ║ ║ Optimizar sin medir = Adivinar ║ ║ ║ ╚════════════════════════════════════════════════════════╝

📊 Workflow¶

┌────────────────────────────────────────┐
│ 1. Identify performance goal           │
│    "Plugin uses 15% CPU, target 10%"   │
│    ↓                                   │
│ 2. Profile current state               │
│    Hotspot: Reverb uses 8% CPU         │
│    ↓                                   │
│ 3. Hypothesize optimization            │
│    "SIMD vectorization could help"     │
│    ↓                                   │
│ 4. Implement change                    │
│    Vectorize inner loop                │
│    ↓                                   │
│ 5. Measure again                       │
│    New: Reverb uses 5% CPU (3% saved)  │
│    ↓                                   │
│ 6. Validate correctness                │
│    Audio output identical?             │
│    ↓                                   │
│ 7. Keep or revert                      │
│    Keep! Total now 12% CPU             │
└────────────────────────────────────────┘

⚠️ Common Mistakes¶

Mistake 1: Optimize Wrong Thing¶

What happened:

Developer spent 2 days optimizing reverb algorithm
Improved reverb speed by 50%
Overall plugin performance: 15.0% CPU → 14.8% CPU

Why it failed: - Reverb only used 0.4% CPU (2.6% of total) - Oscillator used 10% CPU (66% of total) but wasn't touched - Lesson: Optimize the hottest 20% of code first

Should have done: 1. Profile: Find oscillator is the bottleneck 2. Optimize oscillator 3. Measure: Total CPU drops to 8%

Mistake 2: Assume Optimization Helped¶

What happened:

// Developer added SIMD
for (int i = 0; i < size; i += 4) {
    __m128 v = _mm_loadu_ps(&buffer[i]);
    v = _mm_mul_ps(v, gain);
    _mm_storeu_ps(&buffer[i], v);
}

"I added SIMD, must be faster!"

Reality: - Before: 1000 cycles - After: 1020 cycles (slower!) - Compiler was already auto-vectorizing - Manual SIMD prevented compiler optimizations

Lesson: Always measure before/after

Mistake 3: Break Correctness for Speed¶

What happened:

// "Optimized" version
float fastDiv(float a, float b) {
    return a * (1.0f / b);  // Faster than a/b, right?
}

// But breaks with denormals, NaN, infinity...

Results: - 5% faster in benchmark - Crashes with certain input values - Users report audio glitches

Lesson: Speed is meaningless if broken. Test correctness after every optimization.

Mistake 4: Micro-optimize Too Early¶

What happened:

// Day 1 of project: "Let's make this super optimized!"
class PluginState {
    // Custom memory pool
    // Hand-rolled SIMD everywhere
    // Inline assembly
    // Lock-free data structures
};

Results: - 3 weeks spent on "optimization" - Code unmaintainable - Performance same as simple version - Bugs everywhere

Should have done: 1. Write simple, correct version 2. Profile 3. Optimize only proven bottlenecks

✅ Statistical Significance¶

Example: Is Optimization Real?¶

Before optimization: Run 100 times: 10.5ms ± 0.3ms

After optimization: Run 100 times: 10.4ms ± 0.3ms

Question: Is 0.1ms improvement real?

Answer: NO! - Difference: 0.1ms - Standard deviation: 0.3ms - Difference < 1 standard deviation - Within measurement noise

Proper interpretation:

Difference: 0.1ms ± 0.42ms (95% confidence)
Overlap includes 0 → No significant improvement

Statistical Guidelines¶

Minimum detectable improvement: - Should be > 2× standard deviation - Example: If σ = 0.3ms, need improvement > 0.6ms

Sample size: - Run at least 30 times (preferably 100+) - Calculate mean and standard deviation - Use t-test for significance

Reporting:

Before:  10.5ms ± 0.3ms (n=100)
After:   9.2ms ± 0.2ms (n=100)
Change:  -1.3ms ± 0.36ms
p-value: < 0.001
Result:  Statistically significant improvement

🎯 Measurement Best Practices¶

1. Isolate What You're Measuring¶

// BAD: Measures too much
void testProcessing() {
    auto start = now();

    loadPlugin();          // Initialization
    processAudio();        // What we want to measure
    unloadPlugin();        // Cleanup

    auto duration = now() - start;  // Includes everything!
}

// GOOD: Measure only processing
void testProcessing() {
    loadPlugin();  // Outside measurement

    auto start = now();
    processAudio();        // Only this
    auto duration = now() - start;

    unloadPlugin();
}

2. Warm Up¶

// Run once to warm up caches
processAudio(buffer, size);

// Then measure
auto start = now();
for (int i = 0; i < 100; i++) {
    processAudio(buffer, size);
}
auto duration = now() - start;

3. Control Variables¶

Keep constant: - CPU frequency (disable turbo boost) - Other running processes - Input data - Compiler flags - Optimization level

Script example:

# Disable CPU frequency scaling
sudo cpupower frequency-set --governor performance

# Pin to specific CPU
taskset -c 0 ./benchmark

# Re-enable scaling
sudo cpupower frequency-set --governor powersave

4. Multiple Metrics¶

Don't just measure time:

Metric	Tool	What it tells you
Time	Stopwatch	Overall performance
CPU cycles	perf/RDTSC	Actual computation
Cache misses	perf	Memory access patterns
Branch misses	perf	Branch prediction
Instructions	perf	Work done

5. Relative Performance¶

// Compare to baseline
Baseline (unoptimized):     10.0ms
After SIMD:                 7.5ms   (1.33× faster)
After SIMD + SoA:           5.0ms   (2.00× faster)
After all optimizations:    3.2ms   (3.12× faster)

📋 Optimization Checklist¶

Before optimizing: - [ ] Have clear performance goal - [ ] Profiled and identified bottleneck - [ ] Bottleneck is significant (>10% of runtime) - [ ] Hypothesis for why optimization will help

After optimizing: - [ ] Measured performance improvement - [ ] Improvement is statistically significant - [ ] Verified correctness (output unchanged) - [ ] Tested edge cases - [ ] Code remains maintainable - [ ] Documented why optimization was made

🎓 Case Study: Real Optimization¶

Initial state: - Plugin uses 25% CPU - Goal: Reduce to <15%

Step 1: Profile

process() total:        25%
├─ oscillator:          18%  ← Hotspot!
├─ filter:              5%
├─ reverb:              1.5%
└─ other:               0.5%

Step 2: Profile oscillator deeper

oscillator::process():  18%
├─ std::sin():          15%  ← Bottleneck!
├─ phase update:        2%
└─ output:              1%

Step 3: Optimize sin()

// Before
output[i] = std::sin(phase);

// After: Lookup table
output[i] = sin_table[phase_index];

Step 4: Measure

Before: oscillator 18%
After:  oscillator 3%
Total:  10% CPU ✓ Goal achieved!

Step 5: Validate - THD: 0.0008% (was 0.0005%) - acceptable - Listening test: No audible difference - Output spectrum: Identical within 0.1dB

Result: Success! - 15% CPU reduction - Correctness maintained - Goal achieved

💡 Key Takeaways¶

Profile before optimizing - Measure, don't guess
Focus on hotspots - 80/20 rule applies
Measure improvement - Verify it actually helps
Statistical rigor - Account for measurement noise
Validate correctness - Fast but wrong is worthless
Document decisions - Why was this optimized?

"Premature optimization is the root of all evil, but measured optimization is the path to enlightenment." - Adapted from Knuth