05_04_00 - Arithmetic Kernels¶
Overview¶
The arithmetic kernels are the most fundamental operations in the entire DSP system. Every algorithm - filters, oscillators, effects - ultimately reduces to sequences of adds, multiplies, and related operations.
Implemented Kernels¶
Addition Family¶
- add_kernel - Sample-by-sample addition:
out[i] = a[i] + b[i] - add_scalar_kernel - Add constant:
out[i] = signal[i] + scalar - subtract_kernel - Sample-by-sample subtraction:
out[i] = a[i] - b[i] - negate_kernel - Phase inversion:
out[i] = -signal[i]
Multiplication Family¶
- multiply_kernel - Sample-by-sample multiplication:
out[i] = a[i] * b[i] - multiply_scalar_kernel - Multiply by constant:
out[i] = signal[i] * scalar - divide_kernel - Sample-by-sample division:
out[i] = a[i] / b[i] - reciprocal_kernel - Reciprocal:
out[i] = 1.0 / signal[i]
Utilities¶
- flush_denormals_kernel - Flush denormal values to zero for performance
Mathematical Definitions¶
All operations follow standard IEEE 754 floating-point arithmetic unless otherwise noted.
Add Kernel¶
Mathematical: z = x + y
Properties:
- Commutative: x + y = y + x
- Associative: (x + y) + z = x + (y + z)
- Identity: x + 0 = x
Complexity: O(n)
Vectorizable: Yes (SSE/AVX/NEON)
Multiply Kernel¶
Mathematical: z = x · y
Properties:
- Commutative: x · y = y · x
- Associative: (x · y) · z = x · (y · z)
- Identity: x · 1 = x
- Annihilator: x · 0 = 0
Complexity: O(n)
Vectorizable: Yes (SSE/AVX/NEON)
Performance Characteristics¶
Theoretical Performance (float32, AVX2)¶
| Kernel | Scalar (cycles/sample) | SIMD (cycles/sample) | Speedup |
|---|---|---|---|
| add_kernel | 1.0 | 0.125 | 8x |
| multiply_kernel | 1.5 | 0.188 | 8x |
| divide_kernel | 14.0 | 2.0 | 7x |
Note: Actual performance depends on CPU architecture, memory bandwidth, and cache behavior.
SIMD Optimization Status¶
✅ Auto-vectorizable - Compiler can automatically generate SIMD code
- Ensure -O3 -march=native (GCC/Clang) or /O2 /arch:AVX2 (MSVC)
- Loop structure designed for vectorization
- No data dependencies between iterations
Usage Examples¶
Basic Addition¶
#include "arithmetic_kernels.h"
float input_a[1024];
float input_b[1024];
float output[1024];
// Fill inputs...
audiolab::kernels::l0::add_kernel(input_a, input_b, output, 1024);
DC Offset Removal¶
float audio[1024];
// Measured DC offset
float dc_offset = 0.05f;
audiolab::kernels::l0::add_scalar_kernel(audio, -dc_offset, audio, 1024);
Variable Gain¶
float audio[1024];
float gain_envelope[1024]; // Time-varying gain
float output[1024];
audiolab::kernels::l0::multiply_kernel(audio, gain_envelope, output, 1024);
Fixed Attenuation¶
float audio[1024];
// -6dB = 0.5 linear gain
audiolab::kernels::l0::multiply_scalar_kernel(audio, 0.5f, audio, 1024);
Numerical Considerations¶
Denormals¶
Denormal (subnormal) numbers (|x| < 1e-38 for float) cause severe performance degradation. Use flush_denormals_kernel periodically:
float signal[1024];
// ... processing ...
// Flush every N samples to prevent denormal accumulation
audiolab::kernels::l0::flush_denormals_kernel(signal, 1024);
Overflow¶
Addition/multiplication can overflow. Ensure inputs are properly scaled: - Audio signals: typically normalized to [-1.0, 1.0] - Intermediate calculations: may require extended precision (double)
Division by Zero¶
divide_kernel and reciprocal_kernel do not check for zero divisors. This produces inf or NaN. Validate inputs or use clamping.
Testing¶
Run Tests¶
Test Coverage¶
- ✅ Correctness tests (mathematical validation)
- ✅ Edge cases (zero, infinity, denormals)
- ✅ In-place operation (aliasing)
- ✅ Precision tests (double vs float)
- ✅ Large buffer stress test (1M samples)
Expected Output¶
==============================================
ARITHMETIC KERNELS TEST SUITE
==============================================
Running test_add_kernel_basic...
✓ test_add_kernel_basic passed
Running test_multiply_scalar_kernel...
✓ test_multiply_scalar_kernel passed
...
==============================================
ALL TESTS PASSED ✓
==============================================
Integration with Other Modules¶
Used By¶
07_ATOMS_L1- Filters, oscillators, envelopes use arithmetic primitives10_CELLS_L2- Effect chains composed of arithmetic operations13_ENGINES_L3- High-level processors built on arithmetic foundation
Dependencies¶
- None - Arithmetic kernels are L0 (no DSP dependencies)
- Standard library:
<cmath>,<algorithm>
Future Enhancements¶
- Explicit SIMD intrinsics (AVX-512, NEON)
- Fused multiply-add (FMA) support
- GPU/CUDA variants
- Fixed-point (int32) variants for embedded
- Saturation arithmetic option
References¶
- IEEE 754-2008 Floating Point Standard
- Intel Intrinsics Guide: https://software.intel.com/sites/landingpage/IntrinsicsGuide/
- Agner Fog, "Optimizing software in C++"