TAREA 04: Source Separation - Multi-Track Stem Extraction¶
Status: 🔴 PLANNING
🎯 Purpose¶
Separate mixed audio into individual sources (vocals, drums, bass, other) using deep learning models.
🏗️ Key Components¶
- Spleeter: Fast 2/⅘-stem separation (Deezer)
- Demucs: Hybrid spectrogram/waveform separation (Meta)
- Open-Unmix: Open-source multi-track separation
- Time-Frequency Masking: Spectral separation
📋 Architecture¶
class SourceSeparator {
public:
enum StemType { Vocals, Drums, Bass, Other };
// Separate into 4 stems
StemBundle separate4Stems(const std::vector<float>& mixed_audio);
// Extract specific stem
std::vector<float> extractStem(const std::vector<float>& mixed_audio,
StemType stem);
// Real-time separation
void separateRealTime(const float* input, float** outputs,
int num_samples);
};
🎯 Use Cases¶
- Karaoke vocal removal
- Remix/mashup preparation
- Mixing assistance
- Transcription preprocessing
- Live performance stem extraction
📊 Performance Targets¶
- Quality: SDR > 6 dB (vocals), > 5 dB (drums)
- Speed: 0.5x real-time (high quality), 2x real-time (fast mode)
- Latency: < 100ms for real-time mode
📚 Models¶
- Spleeter 4-stem: Fast, production-ready
- Demucs v3: State-of-the-art quality
- X-UMX: Extra-large Open-Unmix
Priority: 🔥 High - Key feature for production workflows