05_05_08 - Optimization Hints
📋 Descripción
Sistema de anotaciones y hints para guiar optimizaciones automáticas del compilador de topologías. Detecta, valida y aplica optimizaciones basadas en patrones DSP, características de plataforma y presupuestos de rendimiento.
🎯 Objetivos
- Anotación de hints: Sistema para agregar metadata de optimización a nodos
- Detección automática: Identificar oportunidades de optimización por patrones
- Validación de hints: Verificar viabilidad y compatibilidad
- Aplicación de hints: Transformar topología según hints
- Estrategias de optimización: Orquestar pipeline completo (O0/O1/O2/O3)
🏗️ Arquitectura
Componentes Principales
optimization_hints.hpp/cpp # Sistema de optimización
├── OptimizationHint # Estructura de hint con metadata
├── HintAnnotator # Agregar/quitar hints de topología
├── HintDetector # Detectar oportunidades automáticamente
├── HintValidator # Validar hints
├── HintApplicator # Aplicar transformaciones
└── OptimizationOrchestrator # Orquestar pipeline completo
Flujo de Optimización
Topology
↓
[HintDetector] → Auto-detect patterns
↓
[HintValidator] → Validate hints
↓
[HintApplicator] → Apply transformations
↓
Optimized Topology
🔖 Tipos de Hints
- Inline: Inline función para eliminar call overhead
- Vectorize: Aplicar SIMD (SSE/AVX/NEON)
- Unroll: Loop unrolling
- Parallelize: Ejecución paralela
- CacheOptimize: Optimizar localidad de cache
Memory Hints
- InPlace: Procesamiento in-place (reduce memoria)
- Preallocate: Pre-asignar buffers
- StackAlloc: Usar stack en vez de heap
- Reuse: Reutilizar memoria de buffers
Computational Hints
- Approximate: Usar aproximaciones (fast math)
- LookupTable: Reemplazar función con LUT
- FastMath: Habilitar fast math
- ConstantFold: Fold constantes en compile-time
Structural Hints
- Fuse: Fusionar nodos adyacentes
- Split: Dividir nodo en múltiples
- Reorder: Reordenar ejecución
- SIMD_SSE: Usar instrucciones SSE
- SIMD_AVX: Usar instrucciones AVX
- SIMD_NEON: Usar instrucciones NEON (ARM)
- GPU: Offload a GPU
📊 Estructura de Hint
struct OptimizationHint {
HintType type; // Inline, Vectorize, etc.
Priority priority; // Critical → Suggestion
Scope scope; // Node/SubGraph/Global
std::vector<NodeID> target_nodes;
std::string description;
// Parámetros específicos del hint
std::unordered_map<std::string, std::string> parameters;
// Impacto estimado
float estimated_speedup; // 2.0 = 2x más rápido
size_t memory_impact; // Bytes (+/-)
// Restricciones
std::vector<std::string> requirements; // "SSE4.1", "C++17"
std::vector<std::string> conflicts; // Hints incompatibles
};
🔍 Detección Automática
Patrones Built-in
HintDetector detector;
auto hints = detector.analyze(topology);
for (const auto& hint : hints) {
std::cout << to_string(hint.type) << " @ " << hint.target_nodes[0] << "\n";
std::cout << " Priority: " << to_string(hint.priority) << "\n";
std::cout << " Speedup: " << hint.estimated_speedup << "x\n";
std::cout << " Description: " << hint.description << "\n";
}
Patrones Detectados
| Patrón |
Condición |
Hint Generado |
| Inline Small Nodes |
Nodos simples (gain, add, multiply) |
Inline |
| Vectorize Arrays |
Operaciones de array |
Vectorize (SSE/AVX) |
| InPlace Processing |
1 input, 1 output, sin feedback |
InPlace |
| Node Fusion |
Cadena lineal A→B→C |
Fuse |
| Lookup Table |
sin/exp/log/tanh |
LookupTable |
| Cache Optimize |
Delays, buffers grandes |
CacheOptimize |
Patrón Personalizado
HintPattern custom_pattern{
"custom_simd_pattern",
"Detect custom SIMD opportunity",
// Matcher function
[](const Topology& topology, const NodeID& node_id) {
const auto& node = topology.nodes().at(node_id);
return node.type == "my_custom_operation" &&
node.inputs.size() > 0 &&
node.inputs[0].buffer_size >= 4;
},
// Hint generator
[](const Topology& topology, const NodeID& node_id) {
OptimizationHint hint;
hint.type = HintType::Vectorize;
hint.priority = Priority::High;
hint.scope = Scope::Node;
hint.target_nodes = {node_id};
hint.description = "Custom SIMD vectorization";
hint.estimated_speedup = 4.0f;
hint.requirements = {"AVX2"};
return hint;
}
};
detector.add_pattern(custom_pattern);
✅ Validación de Hints
Validar Hint Individual
HintValidationResult result = HintValidator::validate_hint(topology, hint);
if (result.is_valid) {
std::cout << "✅ Hint is valid\n";
} else {
std::cout << "❌ Hint validation failed:\n";
for (const auto& error : result.errors) {
std::cout << " - " << error << "\n";
}
}
if (!result.warnings.empty()) {
std::cout << "⚠️ Warnings:\n";
for (const auto& warning : result.warnings) {
std::cout << " - " << warning << "\n";
}
}
Detectar Conflictos
auto conflicts = HintValidator::find_conflicts(topology);
if (!conflicts.empty()) {
std::cout << "⚠️ Found " << conflicts.size() << " conflicting hints:\n";
for (const auto& [h1, h2] : conflicts) {
std::cout << " " << to_string(h1.type) << " vs " << to_string(h2.type) << "\n";
}
}
Verificar Requisitos
std::vector<std::string> available_features = {"SSE4.1", "AVX", "AVX2"};
if (HintValidator::check_requirements(hint, available_features)) {
std::cout << "✅ All requirements met\n";
} else {
std::cout << "❌ Missing requirements:\n";
for (const auto& req : hint.requirements) {
if (std::find(available_features.begin(), available_features.end(), req) ==
available_features.end()) {
std::cout << " - " << req << "\n";
}
}
}
🔧 Aplicación de Hints
Configuración
ApplicationConfig config;
config.min_priority = Priority::Medium;
config.available_features = {"SSE4.1", "AVX2", "C++17"};
config.auto_resolve_conflicts = true;
config.apply_in_priority_order = true;
HintApplicator applicator(config);
Aplicar Hint Individual
ApplicationResult result = applicator.apply_hint(topology, hint);
if (result.applied) {
std::cout << "✅ " << result.message << "\n";
std::cout << " Actual speedup: " << result.actual_speedup << "x\n";
std::cout << " Memory impact: " << result.actual_memory_impact << " bytes\n";
topology = result.modified_topology;
} else {
std::cout << "❌ Could not apply: " << result.message << "\n";
}
Aplicar Todos los Hints
Topology optimized = applicator.apply_all_hints(topology);
std::cout << "Topology optimized with all applicable hints\n";
Aplicar por Prioridad
// Solo hints críticos y high priority
Topology optimized = applicator.apply_by_priority(topology, Priority::High);
🎚️ Niveles de Optimización
Estrategias Predefinidas
enum class OptimizationLevel {
O0, // Sin optimización
O1, // Básico (inline, constant folding)
O2, // Moderado (+ vectorización, inplace)
O3, // Agresivo (+ fusion, unrolling, LUT)
Os, // Optimizar tamaño (reduce code size)
Ofast // Máximo rendimiento (puede romper estándares)
};
Configurar Estrategia
OptimizationStrategy strategy;
strategy.level = OptimizationLevel::O3;
strategy.platform = TargetPlatform::Desktop_x86;
strategy.enable_auto_detection = true;
strategy.enable_auto_application = true;
strategy.min_priority = Priority::Medium;
// Feature flags
strategy.allow_approximations = false;
strategy.allow_fast_math = true;
strategy.allow_simd = true;
strategy.allow_gpu = false;
// Resource constraints
strategy.max_memory_increase = 1024 * 1024; // 1 MB
strategy.max_code_size_increase = 2.0f; // 2x
enum class TargetPlatform {
Generic, // Portable (sin SIMD)
Desktop_x86, // x86/x64 con SSE/AVX
Desktop_ARM, // ARM64 con NEON
Mobile_iOS, // iOS (NEON, restricciones de batería)
Mobile_Android, // Android (NEON, restricciones de batería)
WebAssembly, // WASM (sin SIMD nativo)
Embedded, // Embedded (recursos limitados)
GPU // GPU offload (CUDA/OpenCL)
};
🎼 Orquestador de Optimización
Pipeline Completo
OptimizationStrategy strategy;
strategy.level = OptimizationLevel::O2;
strategy.platform = TargetPlatform::Desktop_x86;
OptimizationOrchestrator orchestrator(strategy);
// Optimizar automáticamente
Topology optimized = orchestrator.optimize(topology);
// Generar reporte
std::string report = orchestrator.generate_report(topology, optimized);
std::cout << report;
Reporte de Optimización
Optimization Report
===================
Strategy: O2
Platform: Desktop x86
Applied Optimizations: 5
- Inline (High)
Inline simple operation to reduce function call overhead
Speedup: 1.1x
- Vectorize (High)
Apply SIMD vectorization to array operations
Speedup: 4.0x
- InPlace (Medium)
Process buffer in-place to save memory
Speedup: 1.05x
- Fuse (Medium)
Fuse adjacent nodes to reduce memory traffic
Speedup: 1.2x
- LookupTable (Medium)
Replace expensive function with lookup table
Speedup: 3.0x
Total estimated speedup: 5.04x
Total memory impact: -2048 bytes
📝 Ejemplo Completo
#include "optimization_hints.hpp"
#include "../05_05_00_graph_representation/topology_builder.hpp"
using namespace audiolab::topology;
using namespace audiolab::topology::optimization;
int main() {
// 1. Construir topología
auto topology = TopologyBuilder()
.setName("filter_chain")
.addNode("input", "external_input", NodeType::Source)
.addNode("gain1", "multiply_scalar", NodeType::Processing)
.addNode("filter", "biquad_lowpass", NodeType::Processing)
.addNode("gain2", "multiply_scalar", NodeType::Processing)
.addNode("output", "external_output", NodeType::Sink)
.connect("input", "out", "gain1", "in")
.connect("gain1", "out", "filter", "in")
.connect("filter", "out", "gain2", "in")
.connect("gain2", "out", "output", "in")
.build();
std::cout << "Original topology: " << topology.nodes().size() << " nodes\n\n";
// 2. Detección automática
HintDetector detector;
auto hints = detector.analyze(topology);
std::cout << "Detected " << hints.size() << " optimization opportunities:\n";
for (const auto& hint : hints) {
std::cout << " - " << to_string(hint.type) << " @ ";
for (const auto& node : hint.target_nodes) {
std::cout << node << " ";
}
std::cout << "\n " << hint.description << "\n";
std::cout << " Speedup: " << hint.estimated_speedup << "x\n";
}
std::cout << "\n";
// 3. Validación
std::cout << "Validating hints...\n";
for (const auto& hint : hints) {
auto validation = HintValidator::validate_hint(topology, hint);
if (!validation.is_valid) {
std::cout << " ❌ " << to_string(hint.type) << ": " << validation.errors[0] << "\n";
} else if (!validation.warnings.empty()) {
std::cout << " ⚠️ " << to_string(hint.type) << ": " << validation.warnings[0] << "\n";
} else {
std::cout << " ✅ " << to_string(hint.type) << "\n";
}
}
std::cout << "\n";
// 4. Estrategia de optimización
OptimizationStrategy strategy;
strategy.level = OptimizationLevel::O3;
strategy.platform = TargetPlatform::Desktop_x86;
strategy.enable_auto_detection = true;
strategy.enable_auto_application = true;
strategy.allow_fast_math = true;
strategy.allow_simd = true;
// 5. Optimizar
OptimizationOrchestrator orchestrator(strategy);
Topology optimized = orchestrator.optimize(topology);
// 6. Reporte
std::string report = orchestrator.generate_report(topology, optimized);
std::cout << report << "\n";
// 7. Hints aplicados
auto applied_hints = HintAnnotator::get_hints_by_priority(optimized, Priority::Suggestion);
std::cout << "\nApplied " << applied_hints.size() << " hints\n";
// 8. Ejemplo: Aplicación manual selectiva
std::cout << "\n=== Manual Application Example ===\n";
ApplicationConfig app_config;
app_config.min_priority = Priority::High;
app_config.available_features = {"SSE4.1", "AVX2"};
HintApplicator applicator(app_config);
for (const auto& hint : hints) {
if (hint.type == HintType::Vectorize || hint.type == HintType::Inline) {
auto result = applicator.apply_hint(topology, hint);
if (result.applied) {
std::cout << "✅ Applied: " << to_string(hint.type) << "\n";
std::cout << " " << result.message << "\n";
}
}
}
return 0;
}
Salida Esperada
Original topology: 5 nodes
Detected 4 optimization opportunities:
- Inline @ gain1
Inline simple operation to reduce function call overhead
Speedup: 1.1x
- Vectorize @ filter
Apply SIMD vectorization to array operations
Speedup: 4.0x
- InPlace @ filter
Process buffer in-place to save memory
Speedup: 1.05x
- Fuse @ gain1 filter
Fuse adjacent nodes to reduce memory traffic
Speedup: 1.2x
Validating hints...
✅ Inline
✅ Vectorize
✅ InPlace
✅ Fuse
Optimization Report
===================
Strategy: O3
Platform: Desktop x86
Applied Optimizations: 4
- Inline (High)
Speedup: 1.1x
- Vectorize (High)
Speedup: 4.0x
- InPlace (Medium)
Speedup: 1.05x
- Fuse (Medium)
Speedup: 1.2x
Applied 4 hints
=== Manual Application Example ===
✅ Applied: Vectorize
Vectorization hint applied (SIMD enabled)
✅ Applied: Inline
Inline hint applied (metadata updated)
📈 Métricas de Rendimiento
Estimaciones de Speedup
| Optimización |
Speedup Típico |
Condiciones |
| Inline |
1.05x - 1.2x |
Nodos pequeños, llamadas frecuentes |
| Vectorize (SSE) |
2x - 4x |
Arrays ≥4 elementos, tipos float/int |
| Vectorize (AVX) |
4x - 8x |
Arrays ≥8 elementos, tipos float/int |
| InPlace |
1.02x - 1.1x |
Reduce tráfico de memoria |
| Fusion |
1.1x - 1.5x |
Elimina buffer intermedio |
| LookupTable |
2x - 10x |
Funciones transcendentales |
| Fast Math |
1.1x - 1.3x |
Operaciones FP |
| Loop Unroll |
1.05x - 1.2x |
Loops cortos |
Impacto de Memoria
| Optimización |
Memory Impact |
Notas |
| InPlace |
-buffer_size |
Elimina buffer de salida |
| Fusion |
-intermediate_buffers |
Reduce buffers intermedios |
| LookupTable |
+1KB - 64KB |
Tamaño de tabla |
| Preallocate |
+total_buffers |
Pre-asigna todo |
| Vectorize |
0 |
Sin cambio |
| Plataforma |
SIMD |
Fast Math |
LUT |
GPU |
| Desktop x86 |
✅ SSE/AVX |
✅ |
✅ |
✅ |
| Desktop ARM |
✅ NEON |
✅ |
✅ |
❌ |
| Mobile |
⚠️ NEON |
⚠️ |
✅ |
❌ |
| WebAssembly |
⚠️ SIMD128 |
✅ |
✅ |
❌ |
| Embedded |
❌ |
✅ |
⚠️ |
❌ |
🔗 Integración con Otros Subsistemas
Con Code Generation (05_05_06)
// Aplicar hints antes de generar código
auto optimized = orchestrator.optimize(topology);
CodeGenerationOptions gen_options;
gen_options.enable_simd = true; // Usar hints SIMD
gen_options.enable_inlining = true; // Usar hints inline
auto code = CodeGenerator::generate(optimized, analysis, buffer_plan, gen_options);
Con Buffer Management (05_05_03)
// Buffer manager usa hints InPlace/Reuse
auto hints = HintAnnotator::get_hints_by_type(topology, HintType::InPlace);
buffer_manager.set_inplace_hints(hints);
auto buffer_plan = buffer_manager.createPlan(topology, execution_order);
Con Composition Rules (05_05_07)
// Validar antes de optimizar
ValidationConfig val_config;
val_config.check_performance = true;
TopologyValidator validator(val_config);
auto report = validator.validate(topology);
if (report.passed) {
auto optimized = orchestrator.optimize(topology);
}
🎯 Casos de Uso
1. Optimización Incremental (Editor)
// Durante edición, sugerir hints en tiempo real
void on_node_added(const Topology& topology, const NodeID& new_node) {
HintDetector detector;
auto hints = detector.analyze(topology);
// Filtrar hints para el nuevo nodo
for (const auto& hint : hints) {
if (std::find(hint.target_nodes.begin(), hint.target_nodes.end(), new_node) !=
hint.target_nodes.end()) {
ui->show_optimization_suggestion(hint);
}
}
}
// Generar variantes para múltiples plataformas
std::vector<TargetPlatform> platforms = {
TargetPlatform::Desktop_x86,
TargetPlatform::Mobile_iOS,
TargetPlatform::WebAssembly
};
for (auto platform : platforms) {
OptimizationStrategy strategy;
strategy.level = OptimizationLevel::O2;
strategy.platform = platform;
OptimizationOrchestrator orchestrator(strategy);
auto optimized = orchestrator.optimize(topology);
// Generar código para plataforma
generate_code_for_platform(optimized, platform);
}
3. A/B Testing de Optimizaciones
// Comparar rendimiento con/sin optimizaciones
Topology baseline = topology;
Topology optimized_o2 = optimize_at_level(topology, OptimizationLevel::O2);
Topology optimized_o3 = optimize_at_level(topology, OptimizationLevel::O3);
benchmark(baseline); // 100ms
benchmark(optimized_o2); // 65ms (1.54x)
benchmark(optimized_o3); // 42ms (2.38x)
🚀 Estado del Sistema
- ✅ Sistema de hints: 21 tipos de hints (performance, memory, computation, structural, platform)
- ✅ Detector automático: 6 patrones built-in + API para custom patterns
- ✅ Validador: Validación de hints, detección de conflictos, verificación de requisitos
- ✅ Aplicador: Aplicación individual/batch, por prioridad, con config
- ✅ Orquestador: Pipeline completo con estrategias O0-Ofast, 8 plataformas target
- ✅ Reportes: Generación de reportes de optimización con métricas
📚 Referencias
- Compiler Optimizations: Muchnick, S. "Advanced Compiler Design and Implementation"
- SIMD Programming: Intel Intrinsics Guide, ARM NEON Programming Guide
- DSP Optimization: Smith, J. O. "Mathematics of the DFT"
- Performance Analysis: Fog, A. "Optimizing Software in C++"
Subsistema: 05_MODULES → 05_05_TOPOLOGY_DESIGN → 05_05_08_optimization_hints
Autor: AudioLab Development Team
Versión: 1.0.0
Última actualización: 2025-10-10