Skip to content

TAREA 03: Classification - Audio Event Detection & Recognition

Status: 🔴 PLANNING

🎯 Purpose

Audio classification for instrument recognition, genre detection, event detection, and speech/music discrimination.

🏗️ Key Components

  • Feature Extraction: MFCC, mel-spectrograms, embeddings
  • CNN Classifiers: VGG, ResNet for audio
  • RNN/LSTM: Temporal pattern recognition
  • Transfer Learning: VGGish, PANN, YAMNet pre-trained models

📋 Architecture

class AudioClassifier {
public:
    // Classify audio clip
    ClassificationResult classify(const std::vector<float>& audio);

    // Multi-label classification
    std::vector<Label> classifyMultiLabel(const std::vector<float>& audio);

    // Real-time classification with sliding window
    void classifyRealTime(const float* audio, int num_samples,
                         std::function<void(ClassificationResult)> callback);
};

🎯 Use Cases

  • Instrument detection (guitar, piano, drums)
  • Genre classification (rock, jazz, classical)
  • Transient detection (kick, snare, hi-hat)
  • Speech/music discrimination
  • Keyword spotting

📚 Models

  • VGGish: General audio classification
  • YAMNet: AudioSet-trained (521 classes)
  • PANN: Large-scale audio pattern recognition
  • Custom CNNs: Domain-specific classifiers

Priority: 🔥 High - Foundation for intelligent routing