05_04_00 - Arithmetic Kernels¶

Overview¶

The arithmetic kernels are the most fundamental operations in the entire DSP system. Every algorithm - filters, oscillators, effects - ultimately reduces to sequences of adds, multiplies, and related operations.

Implemented Kernels¶

Addition Family¶

add_kernel - Sample-by-sample addition: out[i] = a[i] + b[i]
add_scalar_kernel - Add constant: out[i] = signal[i] + scalar
subtract_kernel - Sample-by-sample subtraction: out[i] = a[i] - b[i]
negate_kernel - Phase inversion: out[i] = -signal[i]

Multiplication Family¶

multiply_kernel - Sample-by-sample multiplication: out[i] = a[i] * b[i]
multiply_scalar_kernel - Multiply by constant: out[i] = signal[i] * scalar
divide_kernel - Sample-by-sample division: out[i] = a[i] / b[i]
reciprocal_kernel - Reciprocal: out[i] = 1.0 / signal[i]

Utilities¶

flush_denormals_kernel - Flush denormal values to zero for performance

Mathematical Definitions¶

All operations follow standard IEEE 754 floating-point arithmetic unless otherwise noted.

Add Kernel¶

Mathematical: z = x + y
Properties:
  - Commutative: x + y = y + x
  - Associative: (x + y) + z = x + (y + z)
  - Identity: x + 0 = x
Complexity: O(n)
Vectorizable: Yes (SSE/AVX/NEON)

Multiply Kernel¶

Mathematical: z = x · y
Properties:
  - Commutative: x · y = y · x
  - Associative: (x · y) · z = x · (y · z)
  - Identity: x · 1 = x
  - Annihilator: x · 0 = 0
Complexity: O(n)
Vectorizable: Yes (SSE/AVX/NEON)

Performance Characteristics¶

Theoretical Performance (float32, AVX2)¶

Kernel	Scalar (cycles/sample)	SIMD (cycles/sample)	Speedup
add_kernel	1.0	0.125	8x
multiply_kernel	1.5	0.188	8x
divide_kernel	14.0	2.0	7x

Note: Actual performance depends on CPU architecture, memory bandwidth, and cache behavior.

SIMD Optimization Status¶

✅ Auto-vectorizable - Compiler can automatically generate SIMD code - Ensure -O3 -march=native (GCC/Clang) or /O2 /arch:AVX2 (MSVC) - Loop structure designed for vectorization - No data dependencies between iterations

Usage Examples¶

Basic Addition¶

#include "arithmetic_kernels.h"

float input_a[1024];
float input_b[1024];
float output[1024];

// Fill inputs...

audiolab::kernels::l0::add_kernel(input_a, input_b, output, 1024);

DC Offset Removal¶

float audio[1024];
// Measured DC offset
float dc_offset = 0.05f;

audiolab::kernels::l0::add_scalar_kernel(audio, -dc_offset, audio, 1024);

Variable Gain¶

float audio[1024];
float gain_envelope[1024];  // Time-varying gain
float output[1024];

audiolab::kernels::l0::multiply_kernel(audio, gain_envelope, output, 1024);

Fixed Attenuation¶

float audio[1024];

// -6dB = 0.5 linear gain
audiolab::kernels::l0::multiply_scalar_kernel(audio, 0.5f, audio, 1024);

Numerical Considerations¶

Denormals¶

Denormal (subnormal) numbers (|x| < 1e-38 for float) cause severe performance degradation. Use flush_denormals_kernel periodically:

float signal[1024];
// ... processing ...

// Flush every N samples to prevent denormal accumulation
audiolab::kernels::l0::flush_denormals_kernel(signal, 1024);

Overflow¶

Addition/multiplication can overflow. Ensure inputs are properly scaled: - Audio signals: typically normalized to [-1.0, 1.0] - Intermediate calculations: may require extended precision (double)

Division by Zero¶

divide_kernel and reciprocal_kernel do not check for zero divisors. This produces inf or NaN. Validate inputs or use clamping.

Testing¶

Run Tests¶

cd build
cmake ..
cmake --build .
ctest -V

Test Coverage¶

✅ Correctness tests (mathematical validation)
✅ Edge cases (zero, infinity, denormals)
✅ In-place operation (aliasing)
✅ Precision tests (double vs float)
✅ Large buffer stress test (1M samples)

Expected Output¶

==============================================
ARITHMETIC KERNELS TEST SUITE
==============================================

Running test_add_kernel_basic...
✓ test_add_kernel_basic passed

Running test_multiply_scalar_kernel...
✓ test_multiply_scalar_kernel passed

...

==============================================
ALL TESTS PASSED ✓
==============================================

Integration with Other Modules¶

Used By¶

07_ATOMS_L1 - Filters, oscillators, envelopes use arithmetic primitives
10_CELLS_L2 - Effect chains composed of arithmetic operations
13_ENGINES_L3 - High-level processors built on arithmetic foundation

Dependencies¶

None - Arithmetic kernels are L0 (no DSP dependencies)
Standard library: <cmath>, <algorithm>

Future Enhancements¶

Explicit SIMD intrinsics (AVX-512, NEON)
Fused multiply-add (FMA) support
GPU/CUDA variants
Fixed-point (int32) variants for embedded
Saturation arithmetic option

References¶

IEEE 754-2008 Floating Point Standard
Intel Intrinsics Guide: https://software.intel.com/sites/landingpage/IntrinsicsGuide/
Agner Fog, "Optimizing software in C++"