Skip to content

05_04_00 - Arithmetic Kernels

Overview

The arithmetic kernels are the most fundamental operations in the entire DSP system. Every algorithm - filters, oscillators, effects - ultimately reduces to sequences of adds, multiplies, and related operations.

Implemented Kernels

Addition Family

  • add_kernel - Sample-by-sample addition: out[i] = a[i] + b[i]
  • add_scalar_kernel - Add constant: out[i] = signal[i] + scalar
  • subtract_kernel - Sample-by-sample subtraction: out[i] = a[i] - b[i]
  • negate_kernel - Phase inversion: out[i] = -signal[i]

Multiplication Family

  • multiply_kernel - Sample-by-sample multiplication: out[i] = a[i] * b[i]
  • multiply_scalar_kernel - Multiply by constant: out[i] = signal[i] * scalar
  • divide_kernel - Sample-by-sample division: out[i] = a[i] / b[i]
  • reciprocal_kernel - Reciprocal: out[i] = 1.0 / signal[i]

Utilities

  • flush_denormals_kernel - Flush denormal values to zero for performance

Mathematical Definitions

All operations follow standard IEEE 754 floating-point arithmetic unless otherwise noted.

Add Kernel

Mathematical: z = x + y
Properties:
  - Commutative: x + y = y + x
  - Associative: (x + y) + z = x + (y + z)
  - Identity: x + 0 = x
Complexity: O(n)
Vectorizable: Yes (SSE/AVX/NEON)

Multiply Kernel

Mathematical: z = x · y
Properties:
  - Commutative: x · y = y · x
  - Associative: (x · y) · z = x · (y · z)
  - Identity: x · 1 = x
  - Annihilator: x · 0 = 0
Complexity: O(n)
Vectorizable: Yes (SSE/AVX/NEON)

Performance Characteristics

Theoretical Performance (float32, AVX2)

Kernel Scalar (cycles/sample) SIMD (cycles/sample) Speedup
add_kernel 1.0 0.125 8x
multiply_kernel 1.5 0.188 8x
divide_kernel 14.0 2.0 7x

Note: Actual performance depends on CPU architecture, memory bandwidth, and cache behavior.

SIMD Optimization Status

Auto-vectorizable - Compiler can automatically generate SIMD code - Ensure -O3 -march=native (GCC/Clang) or /O2 /arch:AVX2 (MSVC) - Loop structure designed for vectorization - No data dependencies between iterations

Usage Examples

Basic Addition

#include "arithmetic_kernels.h"

float input_a[1024];
float input_b[1024];
float output[1024];

// Fill inputs...

audiolab::kernels::l0::add_kernel(input_a, input_b, output, 1024);

DC Offset Removal

float audio[1024];
// Measured DC offset
float dc_offset = 0.05f;

audiolab::kernels::l0::add_scalar_kernel(audio, -dc_offset, audio, 1024);

Variable Gain

float audio[1024];
float gain_envelope[1024];  // Time-varying gain
float output[1024];

audiolab::kernels::l0::multiply_kernel(audio, gain_envelope, output, 1024);

Fixed Attenuation

float audio[1024];

// -6dB = 0.5 linear gain
audiolab::kernels::l0::multiply_scalar_kernel(audio, 0.5f, audio, 1024);

Numerical Considerations

Denormals

Denormal (subnormal) numbers (|x| < 1e-38 for float) cause severe performance degradation. Use flush_denormals_kernel periodically:

float signal[1024];
// ... processing ...

// Flush every N samples to prevent denormal accumulation
audiolab::kernels::l0::flush_denormals_kernel(signal, 1024);

Overflow

Addition/multiplication can overflow. Ensure inputs are properly scaled: - Audio signals: typically normalized to [-1.0, 1.0] - Intermediate calculations: may require extended precision (double)

Division by Zero

divide_kernel and reciprocal_kernel do not check for zero divisors. This produces inf or NaN. Validate inputs or use clamping.

Testing

Run Tests

cd build
cmake ..
cmake --build .
ctest -V

Test Coverage

  • ✅ Correctness tests (mathematical validation)
  • ✅ Edge cases (zero, infinity, denormals)
  • ✅ In-place operation (aliasing)
  • ✅ Precision tests (double vs float)
  • ✅ Large buffer stress test (1M samples)

Expected Output

==============================================
ARITHMETIC KERNELS TEST SUITE
==============================================

Running test_add_kernel_basic...
✓ test_add_kernel_basic passed

Running test_multiply_scalar_kernel...
✓ test_multiply_scalar_kernel passed

...

==============================================
ALL TESTS PASSED ✓
==============================================

Integration with Other Modules

Used By

  • 07_ATOMS_L1 - Filters, oscillators, envelopes use arithmetic primitives
  • 10_CELLS_L2 - Effect chains composed of arithmetic operations
  • 13_ENGINES_L3 - High-level processors built on arithmetic foundation

Dependencies

  • None - Arithmetic kernels are L0 (no DSP dependencies)
  • Standard library: <cmath>, <algorithm>

Future Enhancements

  • Explicit SIMD intrinsics (AVX-512, NEON)
  • Fused multiply-add (FMA) support
  • GPU/CUDA variants
  • Fixed-point (int32) variants for embedded
  • Saturation arithmetic option

References