05_04_02 - Delay and Buffers¶
Overview¶
Delay kernels implement temporal memory - the ability to store and retrieve past samples. This is one of the most fundamental operations in DSP, required by virtually every audio algorithm.
Why Delays Are Critical¶
Every recursive filter (IIR), comb filter, echo, reverb, and physical model requires delay. Without efficient delay lines, none of these are possible.
Implemented Delay Types¶
1. Basic Integer Delay (DelayBuffer)¶
Use case: Fixed-delay applications (filters, simple echo)
DelayBuffer<float, 1000> delay; // Max 1000 samples delay
delay.write(input_sample);
float delayed = delay.read(500); // Read 500 samples ago
Features: - ✅ Circular buffer (no memory shifting) - ✅ O(1) read/write operations - ✅ Compile-time buffer size (stack allocation) - ✅ Modular arithmetic for wrapping
How it works:
Buffer: [a, b, c, d, e, f, g, h]
^write_pos (currently at 'a')
write(X): overwrite 'a' with X, advance write_pos
read(3): go back 3 samples from write_pos → returns 'e'
2. Fractional Delay (FractionalDelayBuffer)¶
Use case: Sub-sample precision delays (chorus, flanger, pitch shifting)
FractionalDelayBuffer<float, 2048, InterpQuality::MEDIUM> delay;
delay.write(input_sample);
float delayed = delay.read(123.75f); // 123.75 samples delay
Features: - ✅ Sub-sample precision via interpolation - ✅ Quality selector (linear/cubic/hermite/sinc) - ✅ Transparent integration with interpolation kernels - ✅ Same API as integer delay
Quality options:
- DRAFT (linear): Fastest, some aliasing
- LOW (cubic): Good quality
- MEDIUM (hermite): Default - excellent quality
- HIGH/ULTRA (sinc): Mastering quality
Example - Chorus Effect:
FractionalDelayBuffer<float, 4800> delay; // 100ms at 48kHz
float lfo_phase = 0.0f;
for (size_t i = 0; i < buffer_size; ++i) {
// LFO modulates delay time
float lfo = std::sin(2.0f * M_PI * lfo_phase);
float delay_time = 240.0f + 20.0f * lfo; // 240±20 samples
delay.write(input[i]);
output[i] = delay.read(delay_time);
lfo_phase += 0.5f / 48000.0f; // 0.5 Hz LFO
}
3. Variable Delay (VariableDelayBuffer)¶
Use case: Time-varying delays with automatic smoothing (vibrato, Doppler)
VariableDelayBuffer<float, 4800> delay;
for (size_t i = 0; i < buffer_size; ++i) {
float delay_time = compute_delay(i); // Can change rapidly
output[i] = delay.process(input[i], delay_time);
}
Features: - ✅ Built-in delay time smoothing (prevents clicks) - ✅ Single-call process (write + read) - ✅ Safe for rapid delay changes - ✅ Hermite interpolation (quality)
Why smoothing matters: Abrupt changes in delay time cause discontinuities (clicks). VariableDelayBuffer applies a 1-pole lowpass filter to delay time changes, preventing artifacts.
Example - Vibrato:
VariableDelayBuffer<float, 1000> delay;
float vibrato_phase = 0.0f;
float vibrato_rate = 5.0f; // Hz
float vibrato_depth = 10.0f; // samples
for (size_t i = 0; i < N; ++i) {
float lfo = std::sin(2.0f * M_PI * vibrato_phase);
float delay_samples = 100.0f + vibrato_depth * lfo;
output[i] = delay.process(input[i], delay_samples);
vibrato_phase += vibrato_rate / sample_rate;
if (vibrato_phase >= 1.0f) vibrato_phase -= 1.0f;
}
4. Multi-Tap Delay (MultiTapDelayBuffer)¶
Use case: Multiple delays from single buffer (comb filters, early reflections)
MultiTapDelayBuffer<float, 4800, 8> delay; // Max 8 taps
// Configure taps (delays in samples, gains)
size_t tap_delays[] = {100, 200, 300, 400};
float tap_gains[] = {0.7f, 0.5f, 0.3f, 0.2f};
delay.set_taps(tap_delays, tap_gains, 4);
// Process: returns mixed output from all taps
float output = delay.process(input_sample);
Features: - ✅ Single buffer, multiple read positions - ✅ Individual gain per tap - ✅ Efficient (shared write, multiple reads) - ✅ Compile-time max taps
Example - Comb Filter:
MultiTapDelayBuffer<float, 1000, 1> comb;
// Single tap for feedback
size_t delay_time = 100; // 100 samples
float feedback = 0.5f;
comb.set_taps(&delay_time, &feedback, 1);
for (size_t i = 0; i < N; ++i) {
float delayed = comb.read_tap(0);
float mixed = input[i] + delayed; // Feedforward + feedback
comb.process(mixed);
output[i] = mixed;
}
Example - Early Reflections (Reverb):
MultiTapDelayBuffer<float, 10000, 8> early_reflections;
// Simulate room reflections at different delays
size_t delays[] = {100, 230, 410, 580, 720, 910, 1100, 1350};
float gains[] = {0.8f, 0.6f, 0.5f, 0.4f, 0.35f, 0.3f, 0.25f, 0.2f};
early_reflections.set_taps(delays, gains, 8);
for (size_t i = 0; i < N; ++i) {
output[i] = early_reflections.process(input[i]);
}
Stateless Kernels¶
For custom memory management, stateless versions are provided:
delay_kernel (integer delay)¶
float buffer[1000];
size_t write_pos = 0;
for (size_t i = 0; i < N; ++i) {
output[i] = delay_kernel(buffer, write_pos, 100, input[i], 1000);
}
fractional_delay_kernel¶
float buffer[1000];
size_t write_pos = 0;
for (size_t i = 0; i < N; ++i) {
output[i] = fractional_delay_kernel(buffer, write_pos, 123.5f,
input[i], 1000, InterpQuality::MEDIUM);
}
Performance Considerations¶
Memory Layout¶
All delay buffers use circular buffers - no memory shifting required:
Bad (naive delay):
[a, b, c, d, e]
Insert X → shift all → [X, a, b, c, d] // O(N) operation!
Good (circular buffer):
[a, b, c, d, e]
^write_pos
Insert X → [a, X, c, d, e] // O(1) operation
^write_pos advanced
Cache Efficiency¶
Delays use contiguous memory and modular arithmetic (cheap):
For power-of-2 sizes, use bitwise AND (even faster):
Integration with Interpolation¶
FractionalDelayBuffer and VariableDelayBuffer depend on the interpolation kernels:
#include "interpolation_kernels.h" // Required dependency
#include "delay_and_buffers.h"
// Hermite interpolation used automatically
FractionalDelayBuffer<float, 1000, InterpQuality::MEDIUM> delay;
Quality vs Performance:
- DRAFT: ~2x faster than MEDIUM
- MEDIUM: Recommended (Hermite)
- ULTRA: ~4x slower than MEDIUM
Common Use Cases¶
IIR Filter (1-pole lowpass)¶
DelayBuffer<float, 1> z1; // 1-sample delay
float a = 0.9f; // Coefficient
for (size_t i = 0; i < N; ++i) {
float delayed = z1.read(0);
float current = a * delayed + (1 - a) * input[i];
z1.write(current);
output[i] = current;
}
Simple Echo¶
DelayBuffer<float, 48000> echo; // 1 second at 48kHz
float feedback = 0.5f;
for (size_t i = 0; i < N; ++i) {
float delayed = echo.read(24000); // 500ms delay
float mixed = input[i] + feedback * delayed;
echo.write(mixed);
output[i] = mixed;
}
Karplus-Strong String Synthesis¶
DelayBuffer<float, 1000> string_delay;
float damping = 0.995f;
// Excite with noise burst
for (size_t i = 0; i < 100; ++i) {
string_delay.write(random_noise());
}
// Loop with damping
for (size_t i = 0; i < N; ++i) {
float delayed = string_delay.read(fundamental_period);
float damped = delayed * damping;
string_delay.write(damped);
output[i] = damped;
}
Testing¶
Run tests:
Test coverage: - ✅ Basic delay (impulse response, wrapping) - ✅ Fractional delay (sub-sample accuracy) - ✅ Variable delay (modulation, smoothing) - ✅ Multi-tap (multiple reflections, comb filter) - ✅ Stateless kernels - ✅ Edge cases (zero delay, buffer overflow)
References¶
- Circular buffers: Classic data structure for audio delays
- Fractional delay: V. Välimäki, "Discrete-Time Modeling of Acoustic Tubes Using Fractional Delay Filters"
- Karplus-Strong: K. Karplus & A. Strong, "Digital Synthesis of Plucked-String and Drum Timbres"
Recommendation: Use class-based delays (DelayBuffer, FractionalDelayBuffer) for clarity. Use stateless kernels only when integrating with existing buffer management systems.