BUILD GUIDE - 05_16_PERFORMANCE_VARIANTS¶
Complete guide for building, testing, and validating the Performance Variants subsystem.
Version: 1.0.0 Last Updated: 2025-10-15
๐ Table of Contents¶
- Prerequisites
- Quick Start
- Building Variant Framework
- Building SIMD Variants
- Running Tests
- Running Examples
- Validation Checklist
- Troubleshooting
- Platform-Specific Notes
Prerequisites¶
Required¶
- C++ Compiler with C++17 support:
- Windows: MSVC 2019+ (Visual Studio 16.0+)
- Linux: GCC 8+ or Clang 9+
-
macOS: Xcode 11+ (Apple Clang 11+)
-
CMake 3.15 or later
-
Download: https://cmake.org/download/
-
CPU with SIMD support (for SIMD variants):
- SSE4.1 or later (Intel Core 2 or later, AMD Phenom II or later)
- AVX2 (Intel Haswell or later, AMD Excavator or later)
- Recommended: AVX2 + FMA for maximum performance
Optional¶
- Catch2 - For unit tests (auto-detected by CMake)
- Windows:
vcpkg install catch2:x64-windows - Linux:
sudo apt install catch2or build from source -
macOS:
brew install catch2 -
Git - For version control
Quick Start¶
1. Clone or Navigate to the Project¶
2. Build Variant Framework¶
cd 05_16_00_variant_framework
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release
3. Run an Example¶
4. Build SIMD Variants¶
cd ../../05_16_01_simd_variants
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DENABLE_AVX2=ON -DENABLE_FMA=ON
cmake --build . --config Release
5. Run SIMD Example¶
Building Variant Framework¶
Step 1: Configure with CMake¶
cd 05_16_00_variant_framework
mkdir build && cd build
# Basic configuration
cmake .. -DCMAKE_BUILD_TYPE=Release
# Advanced configuration
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_EXAMPLES=ON \
-DBUILD_TESTS=ON \
-DENABLE_SSE=ON \
-DENABLE_AVX=ON \
-DENABLE_AVX2=ON
Step 2: Build¶
# Windows (MSVC)
cmake --build . --config Release -j 8
# Linux/macOS (Make/Ninja)
cmake --build . -j 8
Step 3: Verify Build¶
Expected output:
Building Custom Rule...
variant_framework.lib (or .a)
basic_dispatcher_example.exe
cpu_detection_example.exe
hot_swap_example.exe
Build Options¶
| Option | Default | Description |
|---|---|---|
BUILD_EXAMPLES |
ON | Build example programs |
BUILD_TESTS |
ON | Build unit tests (requires Catch2) |
ENABLE_SSE |
ON | Enable SSE optimizations |
ENABLE_AVX |
ON | Enable AVX optimizations |
ENABLE_AVX2 |
ON | Enable AVX2 optimizations |
Building SIMD Variants¶
Step 1: Ensure Variant Framework is Built¶
SIMD Variants depends on Variant Framework. Make sure it's built first.
Step 2: Configure with CMake¶
cd 05_16_01_simd_variants
mkdir build && cd build
# Recommended configuration for maximum performance
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DENABLE_SSE4=ON \
-DENABLE_AVX2=ON \
-DENABLE_FMA=ON \
-DBUILD_EXAMPLES=ON \
-DBUILD_TESTS=ON \
-DBUILD_BENCHMARKS=ON
Step 3: Build¶
Step 4: Verify Build¶
Expected output:
simd_variants.lib (or .a)
simd_comparison_example.exe
simd_quality_integration_example.exe (if Quality Metrics available)
test_simd_variants (if Catch2 available)
Build Options¶
| Option | Default | Description |
|---|---|---|
BUILD_EXAMPLES |
ON | Build example programs |
BUILD_TESTS |
ON | Build unit tests (requires Catch2) |
BUILD_BENCHMARKS |
ON | Build benchmarking suite |
ENABLE_SSE4 |
ON | Enable SSE4.1 variants |
ENABLE_AVX2 |
ON | Enable AVX2 variants |
ENABLE_AVX512 |
OFF | Enable AVX-512 variants (experimental) |
ENABLE_NEON |
ON | Enable NEON variants (ARM only) |
ENABLE_FMA |
ON | Enable FMA instructions |
Compiler Flag Reference¶
MSVC (Windows):
GCC/Clang (Linux/macOS):
-mavx2 # Enable AVX2
-mfma # Enable FMA (must be explicit)
-msse4.1 # Enable SSE4.1
-mavx512f # Enable AVX-512 Foundation
-mfpu=neon # Enable NEON (ARM)
Running Tests¶
Variant Framework Tests¶
Currently, Variant Framework uses examples as functional tests.
cd 05_16_00_variant_framework/build
# Run all examples
./Release/basic_dispatcher_example.exe
./Release/cpu_detection_example.exe
./Release/hot_swap_example.exe
Expected Results: - โ CPU features detected correctly - โ Variants registered successfully - โ Optimal variant selected - โ Hot-swapping works without errors - โ Statistics displayed correctly
SIMD Variants Tests¶
cd 05_16_01_simd_variants/build
# Run validation tests (if Catch2 available)
./test_simd_variants
# Run comparison example
./Release/simd_comparison_example.exe
# Run Quality Metrics integration (if 05_18 available)
./Release/simd_quality_integration_example.exe
Expected Results: - โ All validation tests PASS - โ Max error < 1e-6 for gain/mix - โ Max error < 1e-5 for IIR filters - โ Speedups 4-10x demonstrated - โ No crashes or exceptions
Manual Validation¶
Test 1: CPU Detection
Verify your CPU features are correctly detected.Test 2: SIMD Variants
Verify speedups and accuracy.Test 3: Dispatcher
Verify variant selection works.Running Examples¶
Variant Framework Examples¶
1. Basic Dispatcher Example¶
What it demonstrates: - CPU feature detection - Variant registration - Multi-factor scoring - Context-aware selection (battery mode, quality mode) - Manual variant selection - Performance statistics
Expected Output:
=== CPU Information ===
Vendor: AuthenticAMD
Features: โ SSE โ AVX โ AVX2 โ FMA
...
Selected variant: AVX2_Gain
Speedup: 6.7x
2. CPU Detection Example¶
What it demonstrates: - Comprehensive CPU feature enumeration - Cache topology - Core counts - Frequency information
3. Hot-Swap Example¶
What it demonstrates: - Glitch-free variant switching - Crossfade mechanism - Audio continuity during switch
SIMD Variants Examples¶
1. SIMD Comparison Example¶
What it demonstrates: - Accuracy validation vs scalar reference - Performance benchmarking - Speedup calculations - CPU savings - Real-time performance analysis
Expected Output:
=== SIMD Variants Performance Comparison ===
CPU: AMD Ryzen 9 7950X3D
โ SSE4.1 โ AVX2 โ FMA
Variant | Time (ยตs) | Speedup | CPU Savings
------------------+-----------+---------+-------------
Scalar_Reference | 85.23 | 1.00x | 0%
SSE4_Gain | 21.46 | 3.97x | 75%
AVX2_Gain | 12.79 | 6.66x | 85%
โ All variants validated for correctness
โ Max error < 1e-6
2. Quality Metrics Integration Example¶
What it demonstrates: - Integration with Quality Metrics subsystem - Real-time metric collection - Performance monitoring - Accuracy validation - Comprehensive report generation
Validation Checklist¶
Use this checklist to ensure your build is correct and functional.
โ Variant Framework¶
- Project configures with CMake without errors
- Project builds without errors (warnings OK)
-
basic_dispatcher_exampleruns without crashes - CPU features are detected correctly for your CPU
- Variants are registered successfully
- Optimal variant is selected
- Hot-swapping works (crossfade example)
- Statistics are displayed correctly
- No memory leaks (check with valgrind/sanitizers)
โ SIMD Variants¶
- Project configures with CMake without errors
- Project builds without errors (warnings OK)
-
simd_comparison_exampleruns without crashes - Validation tests PASS (all variants)
- Max error < 1e-6 for gain/mix operations
- Max error < 1e-5 for IIR filters
- Speedups 4-10x are demonstrated
- CPU savings 75-90% are shown
- Real-time performance is verified
- No audio artifacts (if testing with audio)
- Quality Metrics integration works (if available)
โ Platform-Specific¶
Windows:
- [ ] MSVC compiler version โฅ 19.20
- [ ] /arch:AVX2 flag is applied
- [ ] Builds in Release configuration
- [ ] Examples run from command prompt
Linux:
- [ ] GCC/Clang version is sufficient
- [ ] -mavx2 -mfma flags are applied
- [ ] ldd shows correct library dependencies
- [ ] No missing shared libraries
macOS: - [ ] Xcode command line tools installed - [ ] Apple Clang supports AVX2 (Intel Macs only) - [ ] M1/M2 Macs: NEON variants (future) - [ ] Code signing not required for local builds
Troubleshooting¶
Problem: CMake can't find Variant Framework¶
Error:
Solution:
1. Build Variant Framework first
2. Ensure it's in the expected location: ../05_16_00_variant_framework
3. Or set CMAKE_PREFIX_PATH:
Problem: Compilation error - cpuid.h not found¶
Error:
Solution:
Already fixed in code - uses <intrin.h> on Windows. If you still see this:
1. Make sure you have the latest code
2. On Windows, ensure you're using MSVC
3. On Linux, install GCC development files: sudo apt install build-essential
Problem: Undefined reference to __builtin_cpu_supports¶
Error:
Solution: This is a GCC-specific function. On Clang, it's compiler builtin and should work. If not: - Try GCC instead of Clang - Or use CMake to detect compiler and use appropriate code path
Problem: Crash on _mm256_load_ps¶
Error:
Cause: Unaligned memory access
Solution:
1. Use AlignedBuffer<T> for allocations
2. Or use _mm256_loadu_ps (unaligned load) instead
3. Verify buffer alignment with isAligned() helper
Problem: Performance is slower than expected¶
Possible Causes:
1. Debug build: Always use Release build for benchmarking
2. Thermal throttling: CPU is overheating
3. Wrong variant selected: Check getActiveVariant()
4. Small buffers: SIMD works best with โฅ256 samples
5. Overhead dominates: Measure on longer runs (1000+ iterations)
Solution:
# Ensure Release build
cmake .. -DCMAKE_BUILD_TYPE=Release
# Check CPU frequency
# Windows: Get-WmiObject Win32_Processor | Select-Object MaxClockSpeed
# Linux: cat /proc/cpuinfo | grep MHz
# Verify optimal variant
./basic_dispatcher_example
# Look for "Selected variant: AVX2_..." if AVX2 supported
Problem: Numerical differences vs scalar¶
Error:
Solution: This is expected for IIR filters due to FP rounding order. If error is: - < 1e-6: Bit-exact (gain, mix) - < 1e-5: Acceptable (IIR filters like biquad) - > 1e-5: Investigate (possible bug)
For IIR filters, use relaxed tolerance:
Problem: Build succeeds but example crashes immediately¶
Possible Causes: 1. Using SIMD instruction not supported by CPU 2. Missing DLL/shared library 3. Stack corruption
Solution:
1. Check CPU features with cpu_detection_example
2. On Windows: Ensure Release DLLs are in PATH
3. On Linux: Check ldd ./example for missing libraries
4. Run under debugger to get stack trace
Platform-Specific Notes¶
Windows (MSVC)¶
Recommended Setup:
# Open x64 Native Tools Command Prompt
cmake -S . -B build -G "Visual Studio 17 2022" -A x64
cmake --build build --config Release -j 8
Notes:
- MSVC automatically enables SSE2 on x64
- /arch:AVX2 enables AVX, AVX2, and FMA
- Use Release configuration for benchmarking
- Debug builds are ~10x slower
Common Issues:
- vcpkg manifest warning: Safe to ignore or disable vcpkg
- #include <intrin.h> already handles MSVC intrinsics
- PowerShell: Use quotes for paths with spaces
Linux (GCC/Clang)¶
Recommended Setup:
Notes:
- Explicit -mavx2 -mfma required (unlike MSVC)
- GCC 8+ or Clang 9+ for full C++17 support
- Use -march=native for automatic SIMD detection (not portable!)
- Install build-essential package
Common Issues:
- Missing <cpuid.h>: Install gcc or build-essential
- Thread library: CMake automatically links pthread
- Library path: May need LD_LIBRARY_PATH for shared libs
macOS (Clang)¶
Recommended Setup:
Notes:
- Xcode Command Line Tools required: xcode-select --install
- Apple Clang lags behind LLVM Clang in version numbers
- Intel Macs: Full AVX2/FMA support
- M1/M2 Macs: NEON variants (future work)
Common Issues: - Missing Xcode: Install from App Store or command line tools - Code signing: Not required for local development builds - M1/M2: Currently no SIMD variants (NEON coming soon)
Advanced Build Options¶
Cross-Compilation¶
# For ARM Linux (from x86 host)
cmake .. \
-DCMAKE_TOOLCHAIN_FILE=arm-linux-gnueabihf.cmake \
-DENABLE_NEON=ON \
-DENABLE_AVX2=OFF
Static Linking¶
Custom Compiler¶
Install to Custom Location¶
Sanitizers (Debug Only)¶
Performance Tuning¶
CPU Governor (Linux)¶
For accurate benchmarking:
# Set to performance mode
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# Verify
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
Disable Turbo Boost (for consistent results)¶
Intel (Linux):
AMD (Linux):
Isolate CPU Cores¶
Next Steps¶
After successful build and validation:
- Experiment with examples - Modify buffer sizes, gain values, frequencies
- Integrate with your code - See INTEGRATION_GUIDE.md
- Profile your application - Use
perf(Linux), VTune (Intel), or VS Profiler (Windows) - Report issues - GitHub Issues or AudioLab support
- Contribute - NEON variants, AVX-512 variants, additional examples welcome!
References¶
- CMake Documentation: https://cmake.org/documentation/
- Intel Intrinsics Guide: https://software.intel.com/sites/landingpage/IntrinsicsGuide/
- ARM NEON Intrinsics: https://developer.arm.com/architectures/instruction-sets/intrinsics/
- SIMD Tutorial: https://www.kernel.org/doc/html/latest/x86/x86-simd.html
Last Updated: 2025-10-15 Maintainer: AudioLab Performance Team
Questions? See README.md or TROUBLESHOOTING.md
Happy Building! ๐