Skip to content

BUILD GUIDE - 05_16_PERFORMANCE_VARIANTS

Complete guide for building, testing, and validating the Performance Variants subsystem.

Version: 1.0.0 Last Updated: 2025-10-15


๐Ÿ“‹ Table of Contents

  1. Prerequisites
  2. Quick Start
  3. Building Variant Framework
  4. Building SIMD Variants
  5. Running Tests
  6. Running Examples
  7. Validation Checklist
  8. Troubleshooting
  9. Platform-Specific Notes

Prerequisites

Required

  • C++ Compiler with C++17 support:
  • Windows: MSVC 2019+ (Visual Studio 16.0+)
  • Linux: GCC 8+ or Clang 9+
  • macOS: Xcode 11+ (Apple Clang 11+)

  • CMake 3.15 or later

  • Download: https://cmake.org/download/

  • CPU with SIMD support (for SIMD variants):

  • SSE4.1 or later (Intel Core 2 or later, AMD Phenom II or later)
  • AVX2 (Intel Haswell or later, AMD Excavator or later)
  • Recommended: AVX2 + FMA for maximum performance

Optional

  • Catch2 - For unit tests (auto-detected by CMake)
  • Windows: vcpkg install catch2:x64-windows
  • Linux: sudo apt install catch2 or build from source
  • macOS: brew install catch2

  • Git - For version control


Quick Start

1. Clone or Navigate to the Project

cd "c:\AudioDev\audio-lab\3 - COMPONENTS\05_MODULES\05_16_PERFORMANCE_VARIANTS"

2. Build Variant Framework

cd 05_16_00_variant_framework
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release

3. Run an Example

# Windows
./Release/basic_dispatcher_example.exe

# Linux/macOS
./basic_dispatcher_example

4. Build SIMD Variants

cd ../../05_16_01_simd_variants
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DENABLE_AVX2=ON -DENABLE_FMA=ON
cmake --build . --config Release

5. Run SIMD Example

# Windows
./Release/simd_comparison_example.exe

# Linux/macOS
./simd_comparison_example

Building Variant Framework

Step 1: Configure with CMake

cd 05_16_00_variant_framework
mkdir build && cd build

# Basic configuration
cmake .. -DCMAKE_BUILD_TYPE=Release

# Advanced configuration
cmake .. \
  -DCMAKE_BUILD_TYPE=Release \
  -DBUILD_EXAMPLES=ON \
  -DBUILD_TESTS=ON \
  -DENABLE_SSE=ON \
  -DENABLE_AVX=ON \
  -DENABLE_AVX2=ON

Step 2: Build

# Windows (MSVC)
cmake --build . --config Release -j 8

# Linux/macOS (Make/Ninja)
cmake --build . -j 8

Step 3: Verify Build

Expected output:

Building Custom Rule...
variant_framework.lib (or .a)
basic_dispatcher_example.exe
cpu_detection_example.exe
hot_swap_example.exe

Build Options

Option Default Description
BUILD_EXAMPLES ON Build example programs
BUILD_TESTS ON Build unit tests (requires Catch2)
ENABLE_SSE ON Enable SSE optimizations
ENABLE_AVX ON Enable AVX optimizations
ENABLE_AVX2 ON Enable AVX2 optimizations

Building SIMD Variants

Step 1: Ensure Variant Framework is Built

SIMD Variants depends on Variant Framework. Make sure it's built first.

Step 2: Configure with CMake

cd 05_16_01_simd_variants
mkdir build && cd build

# Recommended configuration for maximum performance
cmake .. \
  -DCMAKE_BUILD_TYPE=Release \
  -DENABLE_SSE4=ON \
  -DENABLE_AVX2=ON \
  -DENABLE_FMA=ON \
  -DBUILD_EXAMPLES=ON \
  -DBUILD_TESTS=ON \
  -DBUILD_BENCHMARKS=ON

Step 3: Build

# Windows (MSVC)
cmake --build . --config Release -j 8

# Linux/macOS
cmake --build . -j 8

Step 4: Verify Build

Expected output:

simd_variants.lib (or .a)
simd_comparison_example.exe
simd_quality_integration_example.exe (if Quality Metrics available)
test_simd_variants (if Catch2 available)

Build Options

Option Default Description
BUILD_EXAMPLES ON Build example programs
BUILD_TESTS ON Build unit tests (requires Catch2)
BUILD_BENCHMARKS ON Build benchmarking suite
ENABLE_SSE4 ON Enable SSE4.1 variants
ENABLE_AVX2 ON Enable AVX2 variants
ENABLE_AVX512 OFF Enable AVX-512 variants (experimental)
ENABLE_NEON ON Enable NEON variants (ARM only)
ENABLE_FMA ON Enable FMA instructions

Compiler Flag Reference

MSVC (Windows):

/arch:AVX2        # Enable AVX2 (includes FMA)
/arch:AVX512      # Enable AVX-512
/W4               # Warning level 4

GCC/Clang (Linux/macOS):

-mavx2            # Enable AVX2
-mfma             # Enable FMA (must be explicit)
-msse4.1          # Enable SSE4.1
-mavx512f         # Enable AVX-512 Foundation
-mfpu=neon        # Enable NEON (ARM)


Running Tests

Variant Framework Tests

Currently, Variant Framework uses examples as functional tests.

cd 05_16_00_variant_framework/build

# Run all examples
./Release/basic_dispatcher_example.exe
./Release/cpu_detection_example.exe
./Release/hot_swap_example.exe

Expected Results: - โœ… CPU features detected correctly - โœ… Variants registered successfully - โœ… Optimal variant selected - โœ… Hot-swapping works without errors - โœ… Statistics displayed correctly

SIMD Variants Tests

cd 05_16_01_simd_variants/build

# Run validation tests (if Catch2 available)
./test_simd_variants

# Run comparison example
./Release/simd_comparison_example.exe

# Run Quality Metrics integration (if 05_18 available)
./Release/simd_quality_integration_example.exe

Expected Results: - โœ… All validation tests PASS - โœ… Max error < 1e-6 for gain/mix - โœ… Max error < 1e-5 for IIR filters - โœ… Speedups 4-10x demonstrated - โœ… No crashes or exceptions

Manual Validation

Test 1: CPU Detection

./cpu_detection_example
Verify your CPU features are correctly detected.

Test 2: SIMD Variants

./simd_comparison_example
Verify speedups and accuracy.

Test 3: Dispatcher

./basic_dispatcher_example
Verify variant selection works.


Running Examples

Variant Framework Examples

1. Basic Dispatcher Example

./Release/basic_dispatcher_example.exe

What it demonstrates: - CPU feature detection - Variant registration - Multi-factor scoring - Context-aware selection (battery mode, quality mode) - Manual variant selection - Performance statistics

Expected Output:

=== CPU Information ===
Vendor: AuthenticAMD
Features: โœ“ SSE โœ“ AVX โœ“ AVX2 โœ“ FMA
...
Selected variant: AVX2_Gain
Speedup: 6.7x

2. CPU Detection Example

./Release/cpu_detection_example.exe

What it demonstrates: - Comprehensive CPU feature enumeration - Cache topology - Core counts - Frequency information

3. Hot-Swap Example

./Release/hot_swap_example.exe

What it demonstrates: - Glitch-free variant switching - Crossfade mechanism - Audio continuity during switch

SIMD Variants Examples

1. SIMD Comparison Example

./Release/simd_comparison_example.exe

What it demonstrates: - Accuracy validation vs scalar reference - Performance benchmarking - Speedup calculations - CPU savings - Real-time performance analysis

Expected Output:

=== SIMD Variants Performance Comparison ===
CPU: AMD Ryzen 9 7950X3D
โœ“ SSE4.1 โœ“ AVX2 โœ“ FMA

Variant           | Time (ยตs) | Speedup | CPU Savings
------------------+-----------+---------+-------------
Scalar_Reference  |    85.23  |  1.00x  |      0%
SSE4_Gain         |    21.46  |  3.97x  |     75%
AVX2_Gain         |    12.79  |  6.66x  |     85%

โœ“ All variants validated for correctness
โœ“ Max error < 1e-6

2. Quality Metrics Integration Example

./Release/simd_quality_integration_example.exe

What it demonstrates: - Integration with Quality Metrics subsystem - Real-time metric collection - Performance monitoring - Accuracy validation - Comprehensive report generation


Validation Checklist

Use this checklist to ensure your build is correct and functional.

โœ… Variant Framework

  • Project configures with CMake without errors
  • Project builds without errors (warnings OK)
  • basic_dispatcher_example runs without crashes
  • CPU features are detected correctly for your CPU
  • Variants are registered successfully
  • Optimal variant is selected
  • Hot-swapping works (crossfade example)
  • Statistics are displayed correctly
  • No memory leaks (check with valgrind/sanitizers)

โœ… SIMD Variants

  • Project configures with CMake without errors
  • Project builds without errors (warnings OK)
  • simd_comparison_example runs without crashes
  • Validation tests PASS (all variants)
  • Max error < 1e-6 for gain/mix operations
  • Max error < 1e-5 for IIR filters
  • Speedups 4-10x are demonstrated
  • CPU savings 75-90% are shown
  • Real-time performance is verified
  • No audio artifacts (if testing with audio)
  • Quality Metrics integration works (if available)

โœ… Platform-Specific

Windows: - [ ] MSVC compiler version โ‰ฅ 19.20 - [ ] /arch:AVX2 flag is applied - [ ] Builds in Release configuration - [ ] Examples run from command prompt

Linux: - [ ] GCC/Clang version is sufficient - [ ] -mavx2 -mfma flags are applied - [ ] ldd shows correct library dependencies - [ ] No missing shared libraries

macOS: - [ ] Xcode command line tools installed - [ ] Apple Clang supports AVX2 (Intel Macs only) - [ ] M1/M2 Macs: NEON variants (future) - [ ] Code signing not required for local builds


Troubleshooting

Problem: CMake can't find Variant Framework

Error:

CMake Error: Could not find Variant Framework headers

Solution: 1. Build Variant Framework first 2. Ensure it's in the expected location: ../05_16_00_variant_framework 3. Or set CMAKE_PREFIX_PATH:

cmake .. -DCMAKE_PREFIX_PATH=/path/to/variant_framework/install

Problem: Compilation error - cpuid.h not found

Error:

fatal error: cpuid.h: No such file or directory

Solution: Already fixed in code - uses <intrin.h> on Windows. If you still see this: 1. Make sure you have the latest code 2. On Windows, ensure you're using MSVC 3. On Linux, install GCC development files: sudo apt install build-essential

Problem: Undefined reference to __builtin_cpu_supports

Error:

undefined reference to `__builtin_cpu_supports`

Solution: This is a GCC-specific function. On Clang, it's compiler builtin and should work. If not: - Try GCC instead of Clang - Or use CMake to detect compiler and use appropriate code path

Problem: Crash on _mm256_load_ps

Error:

Segmentation fault (core dumped)

Cause: Unaligned memory access

Solution: 1. Use AlignedBuffer<T> for allocations 2. Or use _mm256_loadu_ps (unaligned load) instead 3. Verify buffer alignment with isAligned() helper

Problem: Performance is slower than expected

Possible Causes: 1. Debug build: Always use Release build for benchmarking 2. Thermal throttling: CPU is overheating 3. Wrong variant selected: Check getActiveVariant() 4. Small buffers: SIMD works best with โ‰ฅ256 samples 5. Overhead dominates: Measure on longer runs (1000+ iterations)

Solution:

# Ensure Release build
cmake .. -DCMAKE_BUILD_TYPE=Release

# Check CPU frequency
# Windows: Get-WmiObject Win32_Processor | Select-Object MaxClockSpeed
# Linux: cat /proc/cpuinfo | grep MHz

# Verify optimal variant
./basic_dispatcher_example
# Look for "Selected variant: AVX2_..." if AVX2 supported

Problem: Numerical differences vs scalar

Error:

Validation failed: max error 1.2e-5 exceeds tolerance 1e-6

Solution: This is expected for IIR filters due to FP rounding order. If error is: - < 1e-6: Bit-exact (gain, mix) - < 1e-5: Acceptable (IIR filters like biquad) - > 1e-5: Investigate (possible bug)

For IIR filters, use relaxed tolerance:

REQUIRE(maxError < 1e-5f);  // Instead of 1e-6

Problem: Build succeeds but example crashes immediately

Possible Causes: 1. Using SIMD instruction not supported by CPU 2. Missing DLL/shared library 3. Stack corruption

Solution: 1. Check CPU features with cpu_detection_example 2. On Windows: Ensure Release DLLs are in PATH 3. On Linux: Check ldd ./example for missing libraries 4. Run under debugger to get stack trace


Platform-Specific Notes

Windows (MSVC)

Recommended Setup:

# Open x64 Native Tools Command Prompt
cmake -S . -B build -G "Visual Studio 17 2022" -A x64
cmake --build build --config Release -j 8

Notes: - MSVC automatically enables SSE2 on x64 - /arch:AVX2 enables AVX, AVX2, and FMA - Use Release configuration for benchmarking - Debug builds are ~10x slower

Common Issues: - vcpkg manifest warning: Safe to ignore or disable vcpkg - #include <intrin.h> already handles MSVC intrinsics - PowerShell: Use quotes for paths with spaces

Linux (GCC/Clang)

Recommended Setup:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j $(nproc)

Notes: - Explicit -mavx2 -mfma required (unlike MSVC) - GCC 8+ or Clang 9+ for full C++17 support - Use -march=native for automatic SIMD detection (not portable!) - Install build-essential package

Common Issues: - Missing <cpuid.h>: Install gcc or build-essential - Thread library: CMake automatically links pthread - Library path: May need LD_LIBRARY_PATH for shared libs

macOS (Clang)

Recommended Setup:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j $(sysctl -n hw.ncpu)

Notes: - Xcode Command Line Tools required: xcode-select --install - Apple Clang lags behind LLVM Clang in version numbers - Intel Macs: Full AVX2/FMA support - M1/M2 Macs: NEON variants (future work)

Common Issues: - Missing Xcode: Install from App Store or command line tools - Code signing: Not required for local development builds - M1/M2: Currently no SIMD variants (NEON coming soon)


Advanced Build Options

Cross-Compilation

# For ARM Linux (from x86 host)
cmake .. \
  -DCMAKE_TOOLCHAIN_FILE=arm-linux-gnueabihf.cmake \
  -DENABLE_NEON=ON \
  -DENABLE_AVX2=OFF

Static Linking

cmake .. -DBUILD_SHARED_LIBS=OFF

Custom Compiler

cmake .. \
  -DCMAKE_CXX_COMPILER=clang++ \
  -DCMAKE_C_COMPILER=clang

Install to Custom Location

cmake .. -DCMAKE_INSTALL_PREFIX=/opt/audiolab
cmake --build . --target install

Sanitizers (Debug Only)

cmake .. \
  -DCMAKE_BUILD_TYPE=Debug \
  -DCMAKE_CXX_FLAGS="-fsanitize=address -fsanitize=undefined"

Performance Tuning

CPU Governor (Linux)

For accurate benchmarking:

# Set to performance mode
echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# Verify
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

Disable Turbo Boost (for consistent results)

Intel (Linux):

echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo

AMD (Linux):

echo 0 | sudo tee /sys/devices/system/cpu/cpufreq/boost

Isolate CPU Cores

# Run on specific cores (Linux)
taskset -c 0-3 ./simd_comparison_example

Next Steps

After successful build and validation:

  1. Experiment with examples - Modify buffer sizes, gain values, frequencies
  2. Integrate with your code - See INTEGRATION_GUIDE.md
  3. Profile your application - Use perf (Linux), VTune (Intel), or VS Profiler (Windows)
  4. Report issues - GitHub Issues or AudioLab support
  5. Contribute - NEON variants, AVX-512 variants, additional examples welcome!

References


Last Updated: 2025-10-15 Maintainer: AudioLab Performance Team

Questions? See README.md or TROUBLESHOOTING.md


Happy Building! ๐Ÿš€