TAREA 03: Classification - Audio Event Detection & Recognition¶
Status: 🔴 PLANNING
🎯 Purpose¶
Audio classification for instrument recognition, genre detection, event detection, and speech/music discrimination.
🏗️ Key Components¶
- Feature Extraction: MFCC, mel-spectrograms, embeddings
- CNN Classifiers: VGG, ResNet for audio
- RNN/LSTM: Temporal pattern recognition
- Transfer Learning: VGGish, PANN, YAMNet pre-trained models
📋 Architecture¶
class AudioClassifier {
public:
// Classify audio clip
ClassificationResult classify(const std::vector<float>& audio);
// Multi-label classification
std::vector<Label> classifyMultiLabel(const std::vector<float>& audio);
// Real-time classification with sliding window
void classifyRealTime(const float* audio, int num_samples,
std::function<void(ClassificationResult)> callback);
};
🎯 Use Cases¶
- Instrument detection (guitar, piano, drums)
- Genre classification (rock, jazz, classical)
- Transient detection (kick, snare, hi-hat)
- Speech/music discrimination
- Keyword spotting
📚 Models¶
- VGGish: General audio classification
- YAMNet: AudioSet-trained (521 classes)
- PANN: Large-scale audio pattern recognition
- Custom CNNs: Domain-specific classifiers
Priority: 🔥 High - Foundation for intelligent routing