Skip to content

TAREA 04: Source Separation - Multi-Track Stem Extraction

Status: 🔴 PLANNING

🎯 Purpose

Separate mixed audio into individual sources (vocals, drums, bass, other) using deep learning models.

🏗️ Key Components

  • Spleeter: Fast 2/⅘-stem separation (Deezer)
  • Demucs: Hybrid spectrogram/waveform separation (Meta)
  • Open-Unmix: Open-source multi-track separation
  • Time-Frequency Masking: Spectral separation

📋 Architecture

class SourceSeparator {
public:
    enum StemType { Vocals, Drums, Bass, Other };

    // Separate into 4 stems
    StemBundle separate4Stems(const std::vector<float>& mixed_audio);

    // Extract specific stem
    std::vector<float> extractStem(const std::vector<float>& mixed_audio,
                                   StemType stem);

    // Real-time separation
    void separateRealTime(const float* input, float** outputs,
                         int num_samples);
};

🎯 Use Cases

  • Karaoke vocal removal
  • Remix/mashup preparation
  • Mixing assistance
  • Transcription preprocessing
  • Live performance stem extraction

📊 Performance Targets

  • Quality: SDR > 6 dB (vocals), > 5 dB (drums)
  • Speed: 0.5x real-time (high quality), 2x real-time (fast mode)
  • Latency: < 100ms for real-time mode

📚 Models

  • Spleeter 4-stem: Fast, production-ready
  • Demucs v3: State-of-the-art quality
  • X-UMX: Extra-large Open-Unmix

Priority: 🔥 High - Key feature for production workflows