AudioLab Lock-Free Primitives¶

Location: 2 - FOUNDATION/04_CORE/04_realtime_safety/01_lock_free_primitives/ Type: Header-only INTERFACE library Dependencies: None (foundational module) Target: AudioLab::LockFree

Purpose¶

Lock-free and wait-free synchronization primitives for real-time audio processing. These primitives enable safe communication between threads without using locks, ensuring deterministic performance in audio callbacks.

Key Features¶

Wait-Free Operations: Operations complete in bounded time
No Locks: No blocking synchronization primitives
No Allocations: After construction, zero allocations
Cache-Line Optimized: Prevents false sharing
Real-Time Safe: Safe to use in audio thread

Module Contents¶

01_lock_free_primitives/
├── atomic_types.hpp       # AtomicFloat, AtomicDouble, TaggedPointer
├── spinlock.hpp          # Lightweight spinlock (use with caution)
├── wait_free_spsc.hpp    # Single Producer Single Consumer queue
└── tests/                # Comprehensive multi-threaded tests

Available Primitives¶

AtomicFloat / AtomicDouble (atomic_types.hpp)¶

Lock-free atomic floating point types using bit-casting to integer atomics.

#include "atomic_types.hpp"

using namespace audiolab::core::rt;

// Thread-safe float storage
AtomicFloat gain{1.0f};

// GUI thread
void setGain(float value) {
    gain.store(value, std::memory_order_release);
}

// Audio thread (RT-SAFE)
void processBlock(float* buffer, size_t size) {
    float currentGain = gain.load(std::memory_order_acquire);
    for (size_t i = 0; i < size; ++i) {
        buffer[i] *= currentGain;
    }
}

Why not std::atomic<float>? - Not guaranteed lock-free on all platforms - AtomicFloat uses std::atomic<uint32_t> which IS guaranteed lock-free - Uses safe bit-casting via std::memcpy

Operations: - store(value, order) - Atomic store - load(order) - Atomic load - exchange(value, order) - Swap and return old value - compare_exchange_strong/weak() - CAS operation - is_lock_free() - Always returns true

Performance: ~0.27 ns per store, ~0.21 ns per load (relaxed ordering)

TaggedPointer (atomic_types.hpp)¶

Pointer with version counter to solve the ABA problem in lock-free algorithms.

#include "atomic_types.hpp"

using namespace audiolab::core::rt;

struct Node {
    int data;
    std::atomic<Node*> next;
};

TaggedPointer<Node> head;

// Push operation (lock-free)
void push(Node* newNode) {
    auto oldHead = head.load(std::memory_order_acquire);
    do {
        newNode->next.store(oldHead.getPointer(), std::memory_order_relaxed);
    } while (!head.compare_exchange_weak(
        oldHead, newNode,
        std::memory_order_release,
        std::memory_order_acquire
    ));
}

What is the ABA problem? 1. Thread A reads pointer value A 2. Thread B changes A → B → A (pointer points back to A) 3. Thread A's CAS succeeds (thinks nothing changed) 4. But the data at A might be different!

Solution: TaggedPointer increments a counter on each change, so CAS detects the modification even if pointer is the same.

Structure: - Bits 0-47: Pointer (48-bit canonical address on x86-64) - Bits 48-63: Tag counter (16-bit, wraps around)

Methods: - load() - Returns Tagged struct with pointer + tag - store(ptr) - Stores pointer with incremented tag - compare_exchange_strong() - CAS with automatic tag increment - getPointer() - Extract raw pointer

Spinlock (spinlock.hpp)¶

Lightweight spinlock for ultra-short critical sections (< 10 instructions).

#include "spinlock.hpp"

using namespace audiolab::core::rt;

Spinlock lock;
int counter = 0;

// Worker thread
void incrementCounter() {
    ScopedSpinlock guard(lock);  // RAII lock
    counter++;  // Critical section < 10 instructions
} // Automatically unlocked

⚠️ WARNING: Use with extreme caution!

When to use: - Protecting a single counter increment - Swapping two pointers - Reading/writing a small struct (< 16 bytes)

When NOT to use: - Memory allocation - File I/O - System calls - Loops or complex logic - Audio thread (prefer lock-free algorithms)

Why dangerous? - Spins in tight loop → wastes CPU - Can cause priority inversion on real-time threads - If lock holder is preempted, all spinners waste time - Only efficient if critical section is shorter than OS context switch

Methods: - lock() - Acquire lock (blocks) - try_lock() - Try to acquire (non-blocking) - unlock() - Release lock - ScopedSpinlock - RAII wrapper

WaitFreeSPSC (wait_free_spsc.hpp)¶

Wait-free Single Producer Single Consumer circular buffer queue.

#include "wait_free_spsc.hpp"

using namespace audiolab::core::rt;

// Create queue with power-of-2 capacity
WaitFreeSPSC<float, 256> meterValues;

// Audio thread (PRODUCER - RT-SAFE)
void processBlock(float* buffer, size_t size) {
    float rms = calculateRMS(buffer, size);
    meterValues.push(rms);  // Never blocks
}

// UI thread (CONSUMER)
void updateMeter() {
    float value;
    while (meterValues.pop(value)) {
        display->setLevel(value);
    }
}

Properties: - Wait-Free: Operations complete in bounded time - Lock-Free: No mutex, no blocking - Cache-Friendly: Producer/consumer on separate cache lines - No Allocations: Fixed-size at construction - Power-of-2 Capacity: Efficient modulo via bitwise AND

Limitations: - EXACTLY one producer thread - EXACTLY one consumer thread - Fixed capacity (set at construction)

Common Aliases:

// Small queue (64 elements) for parameter changes
SPSCQueue_Small<ParamChange>

// Medium queue (512 elements) for MIDI events
SPSCQueue_Medium<MidiEvent>

// Large queue (2048 elements) for meter/spectrum data
SPSCQueue_Large<float>

Methods: - push(value) - Try to push (returns false if full) - pop(value) - Try to pop (returns false if empty) - isEmpty() - Check if empty (consumer thread) - isFull() - Check if full (producer thread) - size() - Approximate size (for debugging only) - clear() - Reset (UNSAFE - call only when idle)

Performance: ~4.1 ns per message, 241M messages/sec (tested on x86_64)

Usage Examples¶

Example 1: Parameter Communication¶

#include "wait_free_spsc.hpp"
#include "atomic_types.hpp"

struct ParamChange {
    uint32_t paramID;
    float value;
};

class AudioProcessor {
    WaitFreeSPSC<ParamChange, 64> paramQueue_;
    AtomicFloat gain_{1.0f};

public:
    // GUI thread
    void setParameter(uint32_t id, float value) {
        ParamChange change{id, value};
        paramQueue_.push(change);
    }

    // Audio thread (RT-SAFE)
    void processBlock(float** channels, size_t numChannels, size_t numSamples) {
        // Process pending parameter changes
        ParamChange change;
        while (paramQueue_.pop(change)) {
            if (change.paramID == 0) {
                gain_.store(change.value, std::memory_order_release);
            }
        }

        // Apply gain
        float g = gain_.load(std::memory_order_acquire);
        for (size_t ch = 0; ch < numChannels; ++ch) {
            for (size_t i = 0; i < numSamples; ++i) {
                channels[ch][i] *= g;
            }
        }
    }
};

Example 2: Meter Values to UI¶

#include "wait_free_spsc.hpp"

class MeterBridge {
    WaitFreeSPSC<float, 512> meterValues_;

public:
    // Audio thread - pushes RMS values
    void pushMeterValue(float rms) {
        meterValues_.push(rms);  // Never blocks
    }

    // UI thread - pulls latest values
    float getLatestMeter() {
        float latest = 0.0f;
        float value;
        // Drain queue, keep latest
        while (meterValues_.pop(value)) {
            latest = value;
        }
        return latest;
    }
};

Example 3: Lock-Free Stack (using TaggedPointer)¶

#include "atomic_types.hpp"

template<typename T>
class LockFreeStack {
    struct Node {
        T data;
        Node* next;
    };

    TaggedPointer<Node> head_;

public:
    void push(const T& value) {
        Node* newNode = new Node{value, nullptr};

        auto oldHead = head_.load(std::memory_order_acquire);
        do {
            newNode->next = oldHead.getPointer();
        } while (!head_.compare_exchange_weak(
            oldHead, newNode,
            std::memory_order_release,
            std::memory_order_acquire
        ));
    }

    bool pop(T& value) {
        auto oldHead = head_.load(std::memory_order_acquire);
        do {
            if (oldHead.getPointer() == nullptr) {
                return false;  // Empty
            }
            value = oldHead.getPointer()->data;
        } while (!head_.compare_exchange_weak(
            oldHead, oldHead.getPointer()->next,
            std::memory_order_release,
            std::memory_order_acquire
        ));

        // Note: Memory leak! Real implementation needs hazard pointers
        // or epoch-based reclamation
        return true;
    }
};

Best Practices¶

1. Choose the Right Tool¶

// ✅ SPSC Queue for one-way communication
WaitFreeSPSC<ParamChange, 64> paramQueue;

// ✅ AtomicFloat for single values
AtomicFloat gain;

// ❌ Don't use Spinlock in audio thread
Spinlock lock;  // Prefer lock-free algorithms

2. Memory Ordering¶

// Use relaxed ordering for local operations
float value = atomic.load(std::memory_order_relaxed);

// Use acquire/release for synchronization
// Producer:
queue.push(data);  // Uses release ordering

// Consumer:
queue.pop(data);   // Uses acquire ordering

3. SPSC Queue Sizing¶

// Size for worst-case burst
// Audio callback: 512 samples @ 44.1kHz = 11.6ms
// UI update: 60Hz = 16.6ms
// Worst case: ~1.4 callbacks between UI updates

WaitFreeSPSC<float, 4> meterQueue;  // ❌ Too small!
WaitFreeSPSC<float, 64> meterQueue; // ✅ Safe margin

4. Handle Queue Full/Empty¶

// Producer - decide what to do when full
if (!queue.push(newValue)) {
    // Option 1: Drop oldest (for meters)
    queue.pop(dummy);
    queue.push(newValue);

    // Option 2: Drop newest (for events)
    // Just return, message is lost
}

// Consumer - poll until empty
float value;
while (queue.pop(value)) {
    process(value);
}

Testing¶

All primitives have comprehensive multi-threaded tests:

cd build
cmake --build . --config Release
./tests/Release/test_lock_free

Test Coverage: - ✅ AtomicFloat: Basic, CAS, lock-free guarantee, multi-threaded - ✅ AtomicDouble: Basic, lock-free guarantee - ✅ TaggedPointer: Basic, ABA problem detection, lock-free - ✅ Spinlock: Basic, scoped, multi-threaded - ✅ SPSC Queue: Basic, capacity, move semantics, producer-consumer - ✅ Performance benchmarks

Benchmark Results (x86_64):

AtomicFloat:
  Store (relaxed): 0.27 ns/op
  Load (relaxed): 0.21 ns/op

SPSC Queue:
  Throughput: 4.1 ns/message
  Messages/sec: 241M

Memory Ordering Guide¶

Sequential Consistency (seq_cst) - DEFAULT¶

atomic.store(value);  // Default: seq_cst

- Strongest guarantee - Total global order of all atomic operations - Slowest (memory fence on most architectures) - Use when: Not sure, or need strict ordering

Acquire/Release¶

// Producer
atomic.store(value, std::memory_order_release);

// Consumer
value = atomic.load(std::memory_order_acquire);

- Synchronization point - Release: All prior writes visible to acquirer - Use when: Producer-consumer synchronization

Relaxed¶

counter.fetch_add(1, std::memory_order_relaxed);

- No synchronization guarantee - Fastest (no memory fence) - Use when: Independent counter, order doesn't matter

Platform Support¶

Platform	AtomicFloat	TaggedPointer	SPSC	Spinlock
x86_64	✅ Lock-free	✅ Lock-free	✅	✅
ARM64	✅ Lock-free	✅ Lock-free	✅	✅
x86_32	✅ Lock-free	❌ 64-bit only	✅	✅

Notes: - AtomicDouble may not be lock-free on 32-bit systems - TaggedPointer requires 48-bit canonical addresses (x86-64, ARM64) - All primitives are header-only

References¶

Lock-Free Programming: Preshing on Programming
Memory Ordering: cppreference.com - std::memory_order
ABA Problem: Wikipedia - ABA Problem
SPSC Queue: 1024cores.net