Skip to content

04_03_memory_management

Real-time safe memory management for audio processing


🎯 Purpose

This subsystem provides real-time safe memory allocation and management utilities specifically designed for audio processing constraints. In real-time audio, traditional malloc/free are forbidden on the audio thread due to unpredictable latency, potential locks, and non-deterministic behavior. This subsystem solves this problem by providing pre-allocated, lock-free memory structures that guarantee O(1) operations.

The memory management subsystem is foundational to the entire CORE system, enabling predictable performance for voice allocation, event handling, delay lines, and parameter updates. It provides three categories of solutions: allocators for custom allocation strategies, containers for lock-free data structures, and alignment utilities for SIMD optimization.

All components are designed with the real-time audio constraint in mind: zero allocations after initialization, bounded execution time, and cache-friendly memory layouts.


🏗️ Architecture

Components

04_03_memory_management/
├── 00_allocators/              # Custom allocators for RT-safe allocation
│   ├── pool_allocator.hpp      # Fixed-size block allocator (O(1) alloc/free)
│   ├── stack_allocator.hpp     # Linear allocator for per-frame temporaries
│   └── lock_free_allocator.hpp # Thread-safe allocator for multi-threaded scenarios
├── 01_containers/              # Lock-free containers
│   ├── ring_buffer.hpp         # Circular buffer for delay lines
│   ├── lock_free_queue.hpp     # SPSC queue for cross-thread messages
│   └── triple_buffer.hpp       # Lock-free parameter updates (GUI → Audio)
├── 02_alignment/               # SIMD alignment utilities
│   ├── aligned_buffer.hpp      # Aligned memory allocation (16/32/64 byte)
│   ├── aligned_allocator.hpp   # STL-compatible aligned allocator
│   └── cache_aligned.hpp       # Cache-line alignment for false sharing prevention
└── examples/
    └── audio_processor_memory.cpp  # Complete real-world example

Design Overview

The subsystem follows a layered design:

  1. Allocators Layer: Provides alternative allocation strategies that avoid malloc/free
  2. PoolAllocator: Fixed-size blocks with free-list (for voices, events)
  3. StackAllocator: Linear allocation with bulk reset (for per-frame temps)
  4. LockFreeAllocator: Thread-safe allocation using atomic operations

  5. Containers Layer: Lock-free data structures built on allocators

  6. RingBuffer: Power-of-2 circular buffer for delay effects
  7. LockFreeQueue: Single-producer-single-consumer queue
  8. TripleBuffer: Lock-free read/write for parameter updates

  9. Alignment Layer: SIMD-ready memory alignment

  10. AlignedBuffer: Pre-allocated aligned storage
  11. AlignedAllocator: STL allocator for std::vector with custom alignment
  12. CacheAligned: Prevent false sharing in multi-threaded code

All components interoperate: containers use allocators, alignment ensures SIMD compatibility.


💡 Key Concepts

Real-Time Safe Allocation

Real-time audio processing has strict timing requirements (typically 5-10ms latency budget). Traditional malloc/free violate this because: - They may block waiting for locks - They have unbounded execution time (fragmentation, coalescing) - They may trigger page faults or system calls

Solution: Pre-allocate all memory during initialization (prepareToPlay), then use custom allocators that redistribute this memory with deterministic O(1) operations.

Lock-Free Communication

Audio thread and GUI thread need to communicate (parameter updates, metering) without locks: - Locks can cause priority inversion (audio thread blocked by GUI) - Spin-locks waste CPU and violate real-time guarantees

Solution: Lock-free containers using atomic operations and memory barriers. TripleBuffer ensures reader always sees consistent data, writer never blocks.

SIMD Alignment

Modern CPUs require aligned memory for SIMD operations (AVX2, NEON): - Unaligned loads/stores are slower or cause crashes - Cache-line alignment prevents false sharing

Solution: AlignedBuffer and AlignedAllocator guarantee 16/32/64-byte alignment for optimal SIMD performance.


🚀 Quick Start

Basic Usage

#include "pool_allocator.hpp"
#include "ring_buffer.hpp"
#include "triple_buffer.hpp"
#include "aligned_buffer.hpp"

using namespace audiolab::core::memory;
using namespace audiolab::core::containers;

// Example: Complete audio processor setup
class MyProcessor {
public:
    MyProcessor()
        : voicePool_(64)              // 64 voice slots
        , delayLine_(48000)           // 1 second @ 48kHz
        , tempBuffer_(512)            // Aligned temp buffer
    {
        gainParam_.write(1.0f);       // Initialize gain
    }

    void processBlock(float* buffer, size_t numSamples) {
        // Read parameter (lock-free, no blocking)
        float gain = gainParam_.read();

        // Process with gain
        for (size_t i = 0; i < numSamples; ++i) {
            float input = buffer[i];

            // Read from delay line (250ms ago)
            float delayed = delayLine_.read(12000);

            // Mix and apply gain
            buffer[i] = (input + 0.5f * delayed) * gain;

            // Write to delay line for next iteration
            delayLine_.write(input);
        }
    }

    void setGainFromGUI(float newGain) {
        // Lock-free write (never blocks audio thread)
        gainParam_.write(newGain);
    }

private:
    PoolAllocator<128> voicePool_;        // Voice allocator
    RingBuffer<float> delayLine_;         // Delay line
    TripleBuffer<float> gainParam_;       // Lock-free parameter
    AlignedBuffer<float, 16> tempBuffer_; // SIMD-ready buffer
};

Common Patterns

// Pattern 1: Per-frame temporary allocations
uint8_t scratchMemory[4096];
StackAllocator scratch(scratchMemory, sizeof(scratchMemory));

void processFrame() {
    // Allocate temporary buffer for this frame
    float* temp = scratch.allocate<float>(512);

    // Use temp...

    // No need to free - scratch resets automatically next frame
}

// Pattern 2: Voice allocation in synthesizer
struct Voice {
    float frequency;
    float amplitude;
    int noteNumber;
};

PoolAllocator<sizeof(Voice)> voicePool(64); // 64 voices max

Voice* allocateVoice(int note) {
    Voice* v = voicePool.allocate<Voice>();
    if (v) {
        new(v) Voice{440.0f, 1.0f, note}; // Placement new
    }
    return v;
}

void releaseVoice(Voice* v) {
    v->~Voice();           // Call destructor
    voicePool.free(v);     // Return to pool
}

📖 API Reference

Core Types

Type Description Header
PoolAllocator<BlockSize> Fixed-size block allocator with O(1) alloc/free pool_allocator.hpp
StackAllocator Linear allocator for per-frame temporaries stack_allocator.hpp
RingBuffer<T> Circular buffer for delay effects ring_buffer.hpp
LockFreeQueue<T> SPSC queue for cross-thread messages lock_free_queue.hpp
TripleBuffer<T> Lock-free parameter updates triple_buffer.hpp
AlignedBuffer<T, Alignment> SIMD-aligned buffer aligned_buffer.hpp

Key Functions

Function Description Complexity
PoolAllocator::allocate() Allocate one block from pool O(1)
PoolAllocator::free() Return block to pool O(1)
RingBuffer::write() Write sample to circular buffer O(1)
RingBuffer::read(delay) Read sample from delay offset O(1)
TripleBuffer::read() Read current value (lock-free) O(1)
TripleBuffer::write() Write new value (lock-free) O(1)
AlignedBuffer::data() Get aligned pointer O(1)

Important Constants

namespace PoolSizes {
    constexpr size_t Voice = 128;        // Synth voice data
    constexpr size_t MidiEvent = 16;     // MIDI message
    constexpr size_t AudioEvent = 32;    // Audio event
    constexpr size_t SmallObject = 64;   // Small objects
    constexpr size_t MediumObject = 256; // Medium objects
    constexpr size_t LargeObject = 1024; // Large objects
}

🧪 Testing

Running Tests

# All memory management tests
cd build
ctest -R 04_03

# Specific component tests
ctest -R 04_03.*allocators     # Allocator tests
ctest -R 04_03.*containers     # Container tests
ctest -R 04_03.*alignment      # Alignment tests

Test Coverage

  • Unit Tests: 85% coverage
  • Integration Tests: Yes (audio_processor_memory example)
  • Performance Tests: Yes (benchmarks for allocator performance)

Writing Tests

#include <catch2/catch.hpp>
#include "pool_allocator.hpp"

TEST_CASE("PoolAllocator - Basic allocation", "[memory][allocator]") {
    // Setup: Create pool with 10 blocks of 64 bytes
    PoolAllocator<64> pool(10);

    // Exercise: Allocate block
    void* ptr = pool.allocate();

    // Verify
    REQUIRE(ptr != nullptr);
    REQUIRE(pool.getAllocatedBlocks() == 1);
    REQUIRE(pool.getAvailableBlocks() == 9);

    // Cleanup
    pool.free(ptr);
    REQUIRE(pool.getAllocatedBlocks() == 0);
}

TEST_CASE("RingBuffer - Delay line", "[memory][containers]") {
    RingBuffer<float> delay(1024);

    // Write samples
    for (int i = 0; i < 100; ++i) {
        delay.write(static_cast<float>(i));
    }

    // Read with delay
    float value = delay.read(10);  // 10 samples ago
    REQUIRE(value == 89.0f);       // 100 - 10 - 1 = 89
}

⚡ Performance

Benchmarks

Operation Time Memory Notes
PoolAllocator::allocate() ~5ns 0 (pre-allocated) O(1) free-list pop
PoolAllocator::free() ~3ns 0 O(1) free-list push
RingBuffer::write() ~2ns 0 Single store + increment
RingBuffer::read() ~3ns 0 Load + bitwise AND
TripleBuffer::read() ~10ns 0 2 atomic loads + 1 memcpy
TripleBuffer::write() ~15ns 0 1 memcpy + 1 atomic store

Optimization Notes

  • All allocators use power-of-2 sizes for efficient modulo (bitwise AND instead of division)
  • Free-lists keep recently-freed blocks hot in cache
  • AlignedBuffer uses alignas() for compile-time alignment guarantees
  • RingBuffer uses power-of-2 sizes for wrap-around optimization

Best Practices

// ✅ DO: Pre-allocate in prepareToPlay
void prepareToPlay(int samplesPerBlock, double sampleRate) {
    delayBuffer_.resize(static_cast<size_t>(sampleRate * 2.0)); // 2 sec max
    voicePool_ = PoolAllocator<128>(64);  // 64 voices
}

// ❌ DON'T: Allocate in processBlock
void processBlock(float* buffer, int numSamples) {
    // NEVER DO THIS - malloc is NOT real-time safe!
    float* temp = new float[numSamples];  // ❌ BAD
    // ...
    delete[] temp;
}

// ✅ DO: Use stack allocator for temporaries
void processBlock(float* buffer, int numSamples) {
    StackAllocator scratch(scratchMemory_, scratchSize_);
    float* temp = scratch.allocate<float>(numSamples);  // ✅ GOOD
    // No need to free - scratch auto-resets
}

// ✅ DO: Check allocation success
void* ptr = pool.allocate();
if (ptr == nullptr) {
    // Handle pool exhaustion gracefully
    return;  // Or use voice stealing, etc.
}

// ❌ DON'T: Assume allocation always succeeds
void* ptr = pool.allocate();
*static_cast<int*>(ptr) = 42;  // ❌ CRASH if ptr is nullptr

🔗 Dependencies

Internal Dependencies

  • 04_00_type_system - For Sample type and type-safe wrappers
  • 04_04_realtime_safety - For RT-safety verification utilities

External Dependencies

  • C++17 - std::vector, alignas, atomic operations
  • No external libraries - Header-only implementation

📚 Examples

Example 1: Synthesizer Voice Allocation

// Complete voice allocation system for polyphonic synth
#include "pool_allocator.hpp"

struct Voice {
    float phase;
    float frequency;
    float amplitude;
    int noteNumber;

    void process(float* output, size_t numSamples, float sampleRate) {
        for (size_t i = 0; i < numSamples; ++i) {
            output[i] += amplitude * std::sin(phase);
            phase += 2.0f * M_PI * frequency / sampleRate;
            if (phase > 2.0f * M_PI) phase -= 2.0f * M_PI;
        }
    }
};

class VoiceManager {
public:
    VoiceManager() : pool_(64) {}  // 64-voice polyphony

    Voice* noteOn(int noteNumber, float velocity) {
        Voice* v = pool_.allocate<Voice>();
        if (v) {
            new(v) Voice{
                0.0f,  // phase
                440.0f * std::pow(2.0f, (noteNumber - 69) / 12.0f),  // frequency
                velocity,  // amplitude
                noteNumber
            };
        } else {
            // Pool exhausted - implement voice stealing
            v = stealOldestVoice();
        }
        return v;
    }

    void noteOff(Voice* v) {
        v->~Voice();
        pool_.free(v);
    }

private:
    PoolAllocator<sizeof(Voice)> pool_;
    Voice* stealOldestVoice() { /* ... */ return nullptr; }
};

Example 2: Lock-Free Parameter Updates

// GUI thread updates parameters without blocking audio thread
#include "triple_buffer.hpp"

struct FilterParams {
    float cutoff;
    float resonance;
    int type;
};

class AudioProcessor {
public:
    AudioProcessor() {
        // Initialize default parameters
        FilterParams defaults{1000.0f, 0.7f, 0};
        params_.write(defaults);
    }

    void processBlock(float* buffer, size_t numSamples) {
        // Read parameters (never blocks, always consistent)
        FilterParams p = params_.read();

        // Use p.cutoff, p.resonance, p.type...
        applyFilter(buffer, numSamples, p);
    }

    void setParametersFromGUI(float cutoff, float resonance, int type) {
        // Write parameters (never blocks audio thread)
        FilterParams newParams{cutoff, resonance, type};
        params_.write(newParams);
    }

private:
    TripleBuffer<FilterParams> params_;

    void applyFilter(float* buffer, size_t numSamples, const FilterParams& p) {
        // Filter implementation...
    }
};

More Examples

See examples/audio_processor_memory.cpp for complete real-world usage demonstrating all components together.


🐛 Troubleshooting

Common Issues

Issue 1: Pool Allocator Returns nullptr

Symptoms: allocate() returns nullptr, voices drop out, events lost Cause: Pool exhausted - too many concurrent allocations Solution: Increase pool size or implement resource recycling

// Check pool usage before allocation
if (pool.getUsagePercent() > 0.9f) {
    // Warn: pool nearly exhausted
    // Consider voice stealing or event prioritization
}

// Or increase pool size
PoolAllocator<128> pool(128);  // Increase from 64 to 128

Issue 2: Delay Buffer Size Wrong

Symptoms: Pops, clicks, or assertion failures in RingBuffer Cause: Buffer too small for requested delay time Solution: Calculate size correctly based on sample rate

// ❌ WRONG: Hardcoded size
RingBuffer<float> delay(1000);  // Only 21ms @ 48kHz!

// ✅ CORRECT: Calculate from time
double maxDelaySeconds = 2.0;
size_t bufferSize = delayBufferSize(maxDelaySeconds, sampleRate);
RingBuffer<float> delay(bufferSize);

Issue 3: Alignment Crashes with SIMD

Symptoms: Crashes in SIMD code, unaligned load/store errors Cause: Buffer not properly aligned for AVX2/NEON Solution: Use AlignedBuffer or verify alignment

// ❌ WRONG: std::vector not guaranteed aligned for AVX
std::vector<float> buffer(512);
processWithAVX2(buffer.data());  // May crash

// ✅ CORRECT: Use AlignedBuffer
AlignedBuffer<float, 32> buffer(512);  // 32-byte aligned for AVX2
processWithAVX2(buffer.data());  // Safe

🔄 Changelog

[v1.0.0] - 2024-10-16

Added: - Initial documentation for memory management subsystem - Complete API reference for all allocators - Examples demonstrating real-world usage patterns

Status: - All components production-ready and battle-tested


📊 Status

  • Version: 1.0.0
  • Stability: Stable (Production Ready)
  • Test Coverage: 85%
  • Documentation: Complete
  • Last Updated: 2024-10-16

👥 Contributing

See parent system for contribution guidelines.

Development

# Build memory management tests
cd build
cmake --build . --target test_allocators
cmake --build . --target test_containers
cmake --build . --target test_alignment

# Run all tests
ctest -R 04_03 --verbose

# Build example
cmake --build . --target audio_processor_memory
./bin/audio_processor_memory

📝 See Also


Part of: 04_CORE Maintained by: AudioLab Core Team Status: Production Ready