AudioLab Lock-Free Primitives¶
Location: 2 - FOUNDATION/04_CORE/04_realtime_safety/01_lock_free_primitives/
Type: Header-only INTERFACE library
Dependencies: None (foundational module)
Target: AudioLab::LockFree
Purpose¶
Lock-free and wait-free synchronization primitives for real-time audio processing. These primitives enable safe communication between threads without using locks, ensuring deterministic performance in audio callbacks.
Key Features¶
- Wait-Free Operations: Operations complete in bounded time
- No Locks: No blocking synchronization primitives
- No Allocations: After construction, zero allocations
- Cache-Line Optimized: Prevents false sharing
- Real-Time Safe: Safe to use in audio thread
Module Contents¶
01_lock_free_primitives/
├── atomic_types.hpp # AtomicFloat, AtomicDouble, TaggedPointer
├── spinlock.hpp # Lightweight spinlock (use with caution)
├── wait_free_spsc.hpp # Single Producer Single Consumer queue
└── tests/ # Comprehensive multi-threaded tests
Available Primitives¶
AtomicFloat / AtomicDouble (atomic_types.hpp)¶
Lock-free atomic floating point types using bit-casting to integer atomics.
#include "atomic_types.hpp"
using namespace audiolab::core::rt;
// Thread-safe float storage
AtomicFloat gain{1.0f};
// GUI thread
void setGain(float value) {
gain.store(value, std::memory_order_release);
}
// Audio thread (RT-SAFE)
void processBlock(float* buffer, size_t size) {
float currentGain = gain.load(std::memory_order_acquire);
for (size_t i = 0; i < size; ++i) {
buffer[i] *= currentGain;
}
}
Why not std::atomic<float>?
- Not guaranteed lock-free on all platforms
- AtomicFloat uses std::atomic<uint32_t> which IS guaranteed lock-free
- Uses safe bit-casting via std::memcpy
Operations:
- store(value, order) - Atomic store
- load(order) - Atomic load
- exchange(value, order) - Swap and return old value
- compare_exchange_strong/weak() - CAS operation
- is_lock_free() - Always returns true
Performance: ~0.27 ns per store, ~0.21 ns per load (relaxed ordering)
TaggedPointer (atomic_types.hpp)¶
Pointer with version counter to solve the ABA problem in lock-free algorithms.
#include "atomic_types.hpp"
using namespace audiolab::core::rt;
struct Node {
int data;
std::atomic<Node*> next;
};
TaggedPointer<Node> head;
// Push operation (lock-free)
void push(Node* newNode) {
auto oldHead = head.load(std::memory_order_acquire);
do {
newNode->next.store(oldHead.getPointer(), std::memory_order_relaxed);
} while (!head.compare_exchange_weak(
oldHead, newNode,
std::memory_order_release,
std::memory_order_acquire
));
}
What is the ABA problem? 1. Thread A reads pointer value A 2. Thread B changes A → B → A (pointer points back to A) 3. Thread A's CAS succeeds (thinks nothing changed) 4. But the data at A might be different!
Solution: TaggedPointer increments a counter on each change, so CAS detects the modification even if pointer is the same.
Structure: - Bits 0-47: Pointer (48-bit canonical address on x86-64) - Bits 48-63: Tag counter (16-bit, wraps around)
Methods:
- load() - Returns Tagged struct with pointer + tag
- store(ptr) - Stores pointer with incremented tag
- compare_exchange_strong() - CAS with automatic tag increment
- getPointer() - Extract raw pointer
Spinlock (spinlock.hpp)¶
Lightweight spinlock for ultra-short critical sections (< 10 instructions).
#include "spinlock.hpp"
using namespace audiolab::core::rt;
Spinlock lock;
int counter = 0;
// Worker thread
void incrementCounter() {
ScopedSpinlock guard(lock); // RAII lock
counter++; // Critical section < 10 instructions
} // Automatically unlocked
⚠️ WARNING: Use with extreme caution!
When to use: - Protecting a single counter increment - Swapping two pointers - Reading/writing a small struct (< 16 bytes)
When NOT to use: - Memory allocation - File I/O - System calls - Loops or complex logic - Audio thread (prefer lock-free algorithms)
Why dangerous? - Spins in tight loop → wastes CPU - Can cause priority inversion on real-time threads - If lock holder is preempted, all spinners waste time - Only efficient if critical section is shorter than OS context switch
Methods:
- lock() - Acquire lock (blocks)
- try_lock() - Try to acquire (non-blocking)
- unlock() - Release lock
- ScopedSpinlock - RAII wrapper
WaitFreeSPSC (wait_free_spsc.hpp)¶
Wait-free Single Producer Single Consumer circular buffer queue.
#include "wait_free_spsc.hpp"
using namespace audiolab::core::rt;
// Create queue with power-of-2 capacity
WaitFreeSPSC<float, 256> meterValues;
// Audio thread (PRODUCER - RT-SAFE)
void processBlock(float* buffer, size_t size) {
float rms = calculateRMS(buffer, size);
meterValues.push(rms); // Never blocks
}
// UI thread (CONSUMER)
void updateMeter() {
float value;
while (meterValues.pop(value)) {
display->setLevel(value);
}
}
Properties: - Wait-Free: Operations complete in bounded time - Lock-Free: No mutex, no blocking - Cache-Friendly: Producer/consumer on separate cache lines - No Allocations: Fixed-size at construction - Power-of-2 Capacity: Efficient modulo via bitwise AND
Limitations: - EXACTLY one producer thread - EXACTLY one consumer thread - Fixed capacity (set at construction)
Common Aliases:
// Small queue (64 elements) for parameter changes
SPSCQueue_Small<ParamChange>
// Medium queue (512 elements) for MIDI events
SPSCQueue_Medium<MidiEvent>
// Large queue (2048 elements) for meter/spectrum data
SPSCQueue_Large<float>
Methods:
- push(value) - Try to push (returns false if full)
- pop(value) - Try to pop (returns false if empty)
- isEmpty() - Check if empty (consumer thread)
- isFull() - Check if full (producer thread)
- size() - Approximate size (for debugging only)
- clear() - Reset (UNSAFE - call only when idle)
Performance: ~4.1 ns per message, 241M messages/sec (tested on x86_64)
Usage Examples¶
Example 1: Parameter Communication¶
#include "wait_free_spsc.hpp"
#include "atomic_types.hpp"
struct ParamChange {
uint32_t paramID;
float value;
};
class AudioProcessor {
WaitFreeSPSC<ParamChange, 64> paramQueue_;
AtomicFloat gain_{1.0f};
public:
// GUI thread
void setParameter(uint32_t id, float value) {
ParamChange change{id, value};
paramQueue_.push(change);
}
// Audio thread (RT-SAFE)
void processBlock(float** channels, size_t numChannels, size_t numSamples) {
// Process pending parameter changes
ParamChange change;
while (paramQueue_.pop(change)) {
if (change.paramID == 0) {
gain_.store(change.value, std::memory_order_release);
}
}
// Apply gain
float g = gain_.load(std::memory_order_acquire);
for (size_t ch = 0; ch < numChannels; ++ch) {
for (size_t i = 0; i < numSamples; ++i) {
channels[ch][i] *= g;
}
}
}
};
Example 2: Meter Values to UI¶
#include "wait_free_spsc.hpp"
class MeterBridge {
WaitFreeSPSC<float, 512> meterValues_;
public:
// Audio thread - pushes RMS values
void pushMeterValue(float rms) {
meterValues_.push(rms); // Never blocks
}
// UI thread - pulls latest values
float getLatestMeter() {
float latest = 0.0f;
float value;
// Drain queue, keep latest
while (meterValues_.pop(value)) {
latest = value;
}
return latest;
}
};
Example 3: Lock-Free Stack (using TaggedPointer)¶
#include "atomic_types.hpp"
template<typename T>
class LockFreeStack {
struct Node {
T data;
Node* next;
};
TaggedPointer<Node> head_;
public:
void push(const T& value) {
Node* newNode = new Node{value, nullptr};
auto oldHead = head_.load(std::memory_order_acquire);
do {
newNode->next = oldHead.getPointer();
} while (!head_.compare_exchange_weak(
oldHead, newNode,
std::memory_order_release,
std::memory_order_acquire
));
}
bool pop(T& value) {
auto oldHead = head_.load(std::memory_order_acquire);
do {
if (oldHead.getPointer() == nullptr) {
return false; // Empty
}
value = oldHead.getPointer()->data;
} while (!head_.compare_exchange_weak(
oldHead, oldHead.getPointer()->next,
std::memory_order_release,
std::memory_order_acquire
));
// Note: Memory leak! Real implementation needs hazard pointers
// or epoch-based reclamation
return true;
}
};
Best Practices¶
1. Choose the Right Tool¶
// ✅ SPSC Queue for one-way communication
WaitFreeSPSC<ParamChange, 64> paramQueue;
// ✅ AtomicFloat for single values
AtomicFloat gain;
// ❌ Don't use Spinlock in audio thread
Spinlock lock; // Prefer lock-free algorithms
2. Memory Ordering¶
// Use relaxed ordering for local operations
float value = atomic.load(std::memory_order_relaxed);
// Use acquire/release for synchronization
// Producer:
queue.push(data); // Uses release ordering
// Consumer:
queue.pop(data); // Uses acquire ordering
3. SPSC Queue Sizing¶
// Size for worst-case burst
// Audio callback: 512 samples @ 44.1kHz = 11.6ms
// UI update: 60Hz = 16.6ms
// Worst case: ~1.4 callbacks between UI updates
WaitFreeSPSC<float, 4> meterQueue; // ❌ Too small!
WaitFreeSPSC<float, 64> meterQueue; // ✅ Safe margin
4. Handle Queue Full/Empty¶
// Producer - decide what to do when full
if (!queue.push(newValue)) {
// Option 1: Drop oldest (for meters)
queue.pop(dummy);
queue.push(newValue);
// Option 2: Drop newest (for events)
// Just return, message is lost
}
// Consumer - poll until empty
float value;
while (queue.pop(value)) {
process(value);
}
Testing¶
All primitives have comprehensive multi-threaded tests:
Test Coverage: - ✅ AtomicFloat: Basic, CAS, lock-free guarantee, multi-threaded - ✅ AtomicDouble: Basic, lock-free guarantee - ✅ TaggedPointer: Basic, ABA problem detection, lock-free - ✅ Spinlock: Basic, scoped, multi-threaded - ✅ SPSC Queue: Basic, capacity, move semantics, producer-consumer - ✅ Performance benchmarks
Benchmark Results (x86_64):
AtomicFloat:
Store (relaxed): 0.27 ns/op
Load (relaxed): 0.21 ns/op
SPSC Queue:
Throughput: 4.1 ns/message
Messages/sec: 241M
Memory Ordering Guide¶
Sequential Consistency (seq_cst) - DEFAULT¶
- Strongest guarantee - Total global order of all atomic operations - Slowest (memory fence on most architectures) - Use when: Not sure, or need strict orderingAcquire/Release¶
// Producer
atomic.store(value, std::memory_order_release);
// Consumer
value = atomic.load(std::memory_order_acquire);
Relaxed¶
- No synchronization guarantee - Fastest (no memory fence) - Use when: Independent counter, order doesn't matterPlatform Support¶
| Platform | AtomicFloat | TaggedPointer | SPSC | Spinlock |
|---|---|---|---|---|
| x86_64 | ✅ Lock-free | ✅ Lock-free | ✅ | ✅ |
| ARM64 | ✅ Lock-free | ✅ Lock-free | ✅ | ✅ |
| x86_32 | ✅ Lock-free | ❌ 64-bit only | ✅ | ✅ |
Notes: - AtomicDouble may not be lock-free on 32-bit systems - TaggedPointer requires 48-bit canonical addresses (x86-64, ARM64) - All primitives are header-only
See Also¶
- Real-Time Constraints - RT-safety annotations
- Memory Allocators - RT-safe allocators
- Processing Interfaces - Processor interfaces
References¶
- Lock-Free Programming: Preshing on Programming
- Memory Ordering: cppreference.com - std::memory_order
- ABA Problem: Wikipedia - ABA Problem
- SPSC Queue: 1024cores.net