Skip to content

05_04_02 - Delay and Buffers

Overview

Delay kernels implement temporal memory - the ability to store and retrieve past samples. This is one of the most fundamental operations in DSP, required by virtually every audio algorithm.

Why Delays Are Critical

Every recursive filter (IIR), comb filter, echo, reverb, and physical model requires delay. Without efficient delay lines, none of these are possible.

Implemented Delay Types

1. Basic Integer Delay (DelayBuffer)

Use case: Fixed-delay applications (filters, simple echo)

DelayBuffer<float, 1000> delay;  // Max 1000 samples delay

delay.write(input_sample);
float delayed = delay.read(500);  // Read 500 samples ago

Features: - ✅ Circular buffer (no memory shifting) - ✅ O(1) read/write operations - ✅ Compile-time buffer size (stack allocation) - ✅ Modular arithmetic for wrapping

How it works:

Buffer: [a, b, c, d, e, f, g, h]
         ^write_pos (currently at 'a')

write(X):  overwrite 'a' with X, advance write_pos
read(3):   go back 3 samples from write_pos → returns 'e'


2. Fractional Delay (FractionalDelayBuffer)

Use case: Sub-sample precision delays (chorus, flanger, pitch shifting)

FractionalDelayBuffer<float, 2048, InterpQuality::MEDIUM> delay;

delay.write(input_sample);
float delayed = delay.read(123.75f);  // 123.75 samples delay

Features: - ✅ Sub-sample precision via interpolation - ✅ Quality selector (linear/cubic/hermite/sinc) - ✅ Transparent integration with interpolation kernels - ✅ Same API as integer delay

Quality options: - DRAFT (linear): Fastest, some aliasing - LOW (cubic): Good quality - MEDIUM (hermite): Default - excellent quality - HIGH/ULTRA (sinc): Mastering quality

Example - Chorus Effect:

FractionalDelayBuffer<float, 4800> delay;  // 100ms at 48kHz
float lfo_phase = 0.0f;

for (size_t i = 0; i < buffer_size; ++i) {
    // LFO modulates delay time
    float lfo = std::sin(2.0f * M_PI * lfo_phase);
    float delay_time = 240.0f + 20.0f * lfo;  // 240±20 samples

    delay.write(input[i]);
    output[i] = delay.read(delay_time);

    lfo_phase += 0.5f / 48000.0f;  // 0.5 Hz LFO
}


3. Variable Delay (VariableDelayBuffer)

Use case: Time-varying delays with automatic smoothing (vibrato, Doppler)

VariableDelayBuffer<float, 4800> delay;

for (size_t i = 0; i < buffer_size; ++i) {
    float delay_time = compute_delay(i);  // Can change rapidly
    output[i] = delay.process(input[i], delay_time);
}

Features: - ✅ Built-in delay time smoothing (prevents clicks) - ✅ Single-call process (write + read) - ✅ Safe for rapid delay changes - ✅ Hermite interpolation (quality)

Why smoothing matters: Abrupt changes in delay time cause discontinuities (clicks). VariableDelayBuffer applies a 1-pole lowpass filter to delay time changes, preventing artifacts.

Example - Vibrato:

VariableDelayBuffer<float, 1000> delay;
float vibrato_phase = 0.0f;
float vibrato_rate = 5.0f;  // Hz
float vibrato_depth = 10.0f;  // samples

for (size_t i = 0; i < N; ++i) {
    float lfo = std::sin(2.0f * M_PI * vibrato_phase);
    float delay_samples = 100.0f + vibrato_depth * lfo;

    output[i] = delay.process(input[i], delay_samples);

    vibrato_phase += vibrato_rate / sample_rate;
    if (vibrato_phase >= 1.0f) vibrato_phase -= 1.0f;
}


4. Multi-Tap Delay (MultiTapDelayBuffer)

Use case: Multiple delays from single buffer (comb filters, early reflections)

MultiTapDelayBuffer<float, 4800, 8> delay;  // Max 8 taps

// Configure taps (delays in samples, gains)
size_t tap_delays[] = {100, 200, 300, 400};
float tap_gains[] = {0.7f, 0.5f, 0.3f, 0.2f};
delay.set_taps(tap_delays, tap_gains, 4);

// Process: returns mixed output from all taps
float output = delay.process(input_sample);

Features: - ✅ Single buffer, multiple read positions - ✅ Individual gain per tap - ✅ Efficient (shared write, multiple reads) - ✅ Compile-time max taps

Example - Comb Filter:

MultiTapDelayBuffer<float, 1000, 1> comb;

// Single tap for feedback
size_t delay_time = 100;  // 100 samples
float feedback = 0.5f;
comb.set_taps(&delay_time, &feedback, 1);

for (size_t i = 0; i < N; ++i) {
    float delayed = comb.read_tap(0);
    float mixed = input[i] + delayed;  // Feedforward + feedback
    comb.process(mixed);
    output[i] = mixed;
}

Example - Early Reflections (Reverb):

MultiTapDelayBuffer<float, 10000, 8> early_reflections;

// Simulate room reflections at different delays
size_t delays[] = {100, 230, 410, 580, 720, 910, 1100, 1350};
float gains[] = {0.8f, 0.6f, 0.5f, 0.4f, 0.35f, 0.3f, 0.25f, 0.2f};
early_reflections.set_taps(delays, gains, 8);

for (size_t i = 0; i < N; ++i) {
    output[i] = early_reflections.process(input[i]);
}


Stateless Kernels

For custom memory management, stateless versions are provided:

delay_kernel (integer delay)

float buffer[1000];
size_t write_pos = 0;

for (size_t i = 0; i < N; ++i) {
    output[i] = delay_kernel(buffer, write_pos, 100, input[i], 1000);
}

fractional_delay_kernel

float buffer[1000];
size_t write_pos = 0;

for (size_t i = 0; i < N; ++i) {
    output[i] = fractional_delay_kernel(buffer, write_pos, 123.5f,
                                       input[i], 1000, InterpQuality::MEDIUM);
}

Performance Considerations

Memory Layout

All delay buffers use circular buffers - no memory shifting required:

Bad (naive delay):
  [a, b, c, d, e]
  Insert X → shift all → [X, a, b, c, d]  // O(N) operation!

Good (circular buffer):
  [a, b, c, d, e]
       ^write_pos
  Insert X → [a, X, c, d, e]  // O(1) operation
             ^write_pos advanced

Cache Efficiency

Delays use contiguous memory and modular arithmetic (cheap):

read_pos = (write_pos - delay) % buffer_size;  // Fast modulo

For power-of-2 sizes, use bitwise AND (even faster):

read_pos = (write_pos - delay) & (buffer_size - 1);  // if buffer_size = 2^n


Integration with Interpolation

FractionalDelayBuffer and VariableDelayBuffer depend on the interpolation kernels:

#include "interpolation_kernels.h"  // Required dependency
#include "delay_and_buffers.h"

// Hermite interpolation used automatically
FractionalDelayBuffer<float, 1000, InterpQuality::MEDIUM> delay;

Quality vs Performance: - DRAFT: ~2x faster than MEDIUM - MEDIUM: Recommended (Hermite) - ULTRA: ~4x slower than MEDIUM


Common Use Cases

IIR Filter (1-pole lowpass)

DelayBuffer<float, 1> z1;  // 1-sample delay

float a = 0.9f;  // Coefficient
for (size_t i = 0; i < N; ++i) {
    float delayed = z1.read(0);
    float current = a * delayed + (1 - a) * input[i];
    z1.write(current);
    output[i] = current;
}

Simple Echo

DelayBuffer<float, 48000> echo;  // 1 second at 48kHz

float feedback = 0.5f;
for (size_t i = 0; i < N; ++i) {
    float delayed = echo.read(24000);  // 500ms delay
    float mixed = input[i] + feedback * delayed;
    echo.write(mixed);
    output[i] = mixed;
}

Karplus-Strong String Synthesis

DelayBuffer<float, 1000> string_delay;
float damping = 0.995f;

// Excite with noise burst
for (size_t i = 0; i < 100; ++i) {
    string_delay.write(random_noise());
}

// Loop with damping
for (size_t i = 0; i < N; ++i) {
    float delayed = string_delay.read(fundamental_period);
    float damped = delayed * damping;
    string_delay.write(damped);
    output[i] = damped;
}

Testing

Run tests:

cd build
cmake --build .
ctest -V -R delay

Test coverage: - ✅ Basic delay (impulse response, wrapping) - ✅ Fractional delay (sub-sample accuracy) - ✅ Variable delay (modulation, smoothing) - ✅ Multi-tap (multiple reflections, comb filter) - ✅ Stateless kernels - ✅ Edge cases (zero delay, buffer overflow)


References

  • Circular buffers: Classic data structure for audio delays
  • Fractional delay: V. Välimäki, "Discrete-Time Modeling of Acoustic Tubes Using Fractional Delay Filters"
  • Karplus-Strong: K. Karplus & A. Strong, "Digital Synthesis of Plucked-String and Drum Timbres"

Recommendation: Use class-based delays (DelayBuffer, FractionalDelayBuffer) for clarity. Use stateless kernels only when integrating with existing buffer management systems.