05_05_08 - Optimization Hints¶

📋 Descripción¶

Sistema de anotaciones y hints para guiar optimizaciones automáticas del compilador de topologías. Detecta, valida y aplica optimizaciones basadas en patrones DSP, características de plataforma y presupuestos de rendimiento.

🎯 Objetivos¶

Anotación de hints: Sistema para agregar metadata de optimización a nodos
Detección automática: Identificar oportunidades de optimización por patrones
Validación de hints: Verificar viabilidad y compatibilidad
Aplicación de hints: Transformar topología según hints
Estrategias de optimización: Orquestar pipeline completo (O0/O1/O2/O3)

🏗️ Arquitectura¶

Componentes Principales¶

optimization_hints.hpp/cpp       # Sistema de optimización
├── OptimizationHint            # Estructura de hint con metadata
├── HintAnnotator               # Agregar/quitar hints de topología
├── HintDetector               # Detectar oportunidades automáticamente
├── HintValidator              # Validar hints
├── HintApplicator             # Aplicar transformaciones
└── OptimizationOrchestrator   # Orquestar pipeline completo

Flujo de Optimización¶

Topology
    ↓
[HintDetector] → Auto-detect patterns
    ↓
[HintValidator] → Validate hints
    ↓
[HintApplicator] → Apply transformations
    ↓
Optimized Topology

🔖 Tipos de Hints¶

Performance Hints¶

Inline: Inline función para eliminar call overhead
Vectorize: Aplicar SIMD (SSE/AVX/NEON)
Unroll: Loop unrolling
Parallelize: Ejecución paralela
CacheOptimize: Optimizar localidad de cache

Memory Hints¶

InPlace: Procesamiento in-place (reduce memoria)
Preallocate: Pre-asignar buffers
StackAlloc: Usar stack en vez de heap
Reuse: Reutilizar memoria de buffers

Computational Hints¶

Approximate: Usar aproximaciones (fast math)
LookupTable: Reemplazar función con LUT
FastMath: Habilitar fast math
ConstantFold: Fold constantes en compile-time

Structural Hints¶

Fuse: Fusionar nodos adyacentes
Split: Dividir nodo en múltiples
Reorder: Reordenar ejecución

Platform Hints¶

SIMD_SSE: Usar instrucciones SSE
SIMD_AVX: Usar instrucciones AVX
SIMD_NEON: Usar instrucciones NEON (ARM)
GPU: Offload a GPU

📊 Estructura de Hint¶

struct OptimizationHint {
    HintType type;                  // Inline, Vectorize, etc.
    Priority priority;              // Critical → Suggestion
    Scope scope;                    // Node/SubGraph/Global

    std::vector<NodeID> target_nodes;
    std::string description;

    // Parámetros específicos del hint
    std::unordered_map<std::string, std::string> parameters;

    // Impacto estimado
    float estimated_speedup;        // 2.0 = 2x más rápido
    size_t memory_impact;           // Bytes (+/-)

    // Restricciones
    std::vector<std::string> requirements;  // "SSE4.1", "C++17"
    std::vector<std::string> conflicts;     // Hints incompatibles
};

🔍 Detección Automática¶

Patrones Built-in¶

HintDetector detector;
auto hints = detector.analyze(topology);

for (const auto& hint : hints) {
    std::cout << to_string(hint.type) << " @ " << hint.target_nodes[0] << "\n";
    std::cout << "  Priority: " << to_string(hint.priority) << "\n";
    std::cout << "  Speedup: " << hint.estimated_speedup << "x\n";
    std::cout << "  Description: " << hint.description << "\n";
}

Patrones Detectados¶

Patrón	Condición	Hint Generado
Inline Small Nodes	Nodos simples (gain, add, multiply)	`Inline`
Vectorize Arrays	Operaciones de array	`Vectorize` (SSE/AVX)
InPlace Processing	1 input, 1 output, sin feedback	`InPlace`
Node Fusion	Cadena lineal A→B→C	`Fuse`
Lookup Table	sin/exp/log/tanh	`LookupTable`
Cache Optimize	Delays, buffers grandes	`CacheOptimize`

Patrón Personalizado¶

HintPattern custom_pattern{
    "custom_simd_pattern",
    "Detect custom SIMD opportunity",

    // Matcher function
    [](const Topology& topology, const NodeID& node_id) {
        const auto& node = topology.nodes().at(node_id);
        return node.type == "my_custom_operation" &&
               node.inputs.size() > 0 &&
               node.inputs[0].buffer_size >= 4;
    },

    // Hint generator
    [](const Topology& topology, const NodeID& node_id) {
        OptimizationHint hint;
        hint.type = HintType::Vectorize;
        hint.priority = Priority::High;
        hint.scope = Scope::Node;
        hint.target_nodes = {node_id};
        hint.description = "Custom SIMD vectorization";
        hint.estimated_speedup = 4.0f;
        hint.requirements = {"AVX2"};
        return hint;
    }
};

detector.add_pattern(custom_pattern);

✅ Validación de Hints¶

Validar Hint Individual¶

HintValidationResult result = HintValidator::validate_hint(topology, hint);

if (result.is_valid) {
    std::cout << "✅ Hint is valid\n";
} else {
    std::cout << "❌ Hint validation failed:\n";
    for (const auto& error : result.errors) {
        std::cout << "  - " << error << "\n";
    }
}

if (!result.warnings.empty()) {
    std::cout << "⚠️  Warnings:\n";
    for (const auto& warning : result.warnings) {
        std::cout << "  - " << warning << "\n";
    }
}

Detectar Conflictos¶

auto conflicts = HintValidator::find_conflicts(topology);

if (!conflicts.empty()) {
    std::cout << "⚠️  Found " << conflicts.size() << " conflicting hints:\n";
    for (const auto& [h1, h2] : conflicts) {
        std::cout << "  " << to_string(h1.type) << " vs " << to_string(h2.type) << "\n";
    }
}

Verificar Requisitos¶

std::vector<std::string> available_features = {"SSE4.1", "AVX", "AVX2"};

if (HintValidator::check_requirements(hint, available_features)) {
    std::cout << "✅ All requirements met\n";
} else {
    std::cout << "❌ Missing requirements:\n";
    for (const auto& req : hint.requirements) {
        if (std::find(available_features.begin(), available_features.end(), req) ==
            available_features.end()) {
            std::cout << "  - " << req << "\n";
        }
    }
}

🔧 Aplicación de Hints¶

Configuración¶

ApplicationConfig config;
config.min_priority = Priority::Medium;
config.available_features = {"SSE4.1", "AVX2", "C++17"};
config.auto_resolve_conflicts = true;
config.apply_in_priority_order = true;

HintApplicator applicator(config);

Aplicar Hint Individual¶

ApplicationResult result = applicator.apply_hint(topology, hint);

if (result.applied) {
    std::cout << "✅ " << result.message << "\n";
    std::cout << "   Actual speedup: " << result.actual_speedup << "x\n";
    std::cout << "   Memory impact: " << result.actual_memory_impact << " bytes\n";

    topology = result.modified_topology;
} else {
    std::cout << "❌ Could not apply: " << result.message << "\n";
}

Aplicar Todos los Hints¶

Topology optimized = applicator.apply_all_hints(topology);

std::cout << "Topology optimized with all applicable hints\n";

Aplicar por Prioridad¶

// Solo hints críticos y high priority
Topology optimized = applicator.apply_by_priority(topology, Priority::High);

🎚️ Niveles de Optimización¶

Estrategias Predefinidas¶

enum class OptimizationLevel {
    O0,     // Sin optimización
    O1,     // Básico (inline, constant folding)
    O2,     // Moderado (+ vectorización, inplace)
    O3,     // Agresivo (+ fusion, unrolling, LUT)
    Os,     // Optimizar tamaño (reduce code size)
    Ofast   // Máximo rendimiento (puede romper estándares)
};

Configurar Estrategia¶

OptimizationStrategy strategy;
strategy.level = OptimizationLevel::O3;
strategy.platform = TargetPlatform::Desktop_x86;
strategy.enable_auto_detection = true;
strategy.enable_auto_application = true;
strategy.min_priority = Priority::Medium;

// Feature flags
strategy.allow_approximations = false;
strategy.allow_fast_math = true;
strategy.allow_simd = true;
strategy.allow_gpu = false;

// Resource constraints
strategy.max_memory_increase = 1024 * 1024;  // 1 MB
strategy.max_code_size_increase = 2.0f;      // 2x

Plataformas Target¶

enum class TargetPlatform {
    Generic,           // Portable (sin SIMD)
    Desktop_x86,       // x86/x64 con SSE/AVX
    Desktop_ARM,       // ARM64 con NEON
    Mobile_iOS,        // iOS (NEON, restricciones de batería)
    Mobile_Android,    // Android (NEON, restricciones de batería)
    WebAssembly,       // WASM (sin SIMD nativo)
    Embedded,          // Embedded (recursos limitados)
    GPU                // GPU offload (CUDA/OpenCL)
};

🎼 Orquestador de Optimización¶

Pipeline Completo¶

OptimizationStrategy strategy;
strategy.level = OptimizationLevel::O2;
strategy.platform = TargetPlatform::Desktop_x86;

OptimizationOrchestrator orchestrator(strategy);

// Optimizar automáticamente
Topology optimized = orchestrator.optimize(topology);

// Generar reporte
std::string report = orchestrator.generate_report(topology, optimized);
std::cout << report;

Reporte de Optimización¶

Optimization Report
===================

Strategy: O2
Platform: Desktop x86

Applied Optimizations: 5
  - Inline (High)
    Inline simple operation to reduce function call overhead
    Speedup: 1.1x

  - Vectorize (High)
    Apply SIMD vectorization to array operations
    Speedup: 4.0x

  - InPlace (Medium)
    Process buffer in-place to save memory
    Speedup: 1.05x

  - Fuse (Medium)
    Fuse adjacent nodes to reduce memory traffic
    Speedup: 1.2x

  - LookupTable (Medium)
    Replace expensive function with lookup table
    Speedup: 3.0x

Total estimated speedup: 5.04x
Total memory impact: -2048 bytes

📝 Ejemplo Completo¶

#include "optimization_hints.hpp"
#include "../05_05_00_graph_representation/topology_builder.hpp"

using namespace audiolab::topology;
using namespace audiolab::topology::optimization;

int main() {
    // 1. Construir topología
    auto topology = TopologyBuilder()
        .setName("filter_chain")
        .addNode("input", "external_input", NodeType::Source)
        .addNode("gain1", "multiply_scalar", NodeType::Processing)
        .addNode("filter", "biquad_lowpass", NodeType::Processing)
        .addNode("gain2", "multiply_scalar", NodeType::Processing)
        .addNode("output", "external_output", NodeType::Sink)
        .connect("input", "out", "gain1", "in")
        .connect("gain1", "out", "filter", "in")
        .connect("filter", "out", "gain2", "in")
        .connect("gain2", "out", "output", "in")
        .build();

    std::cout << "Original topology: " << topology.nodes().size() << " nodes\n\n";

    // 2. Detección automática
    HintDetector detector;
    auto hints = detector.analyze(topology);

    std::cout << "Detected " << hints.size() << " optimization opportunities:\n";
    for (const auto& hint : hints) {
        std::cout << "  - " << to_string(hint.type) << " @ ";
        for (const auto& node : hint.target_nodes) {
            std::cout << node << " ";
        }
        std::cout << "\n    " << hint.description << "\n";
        std::cout << "    Speedup: " << hint.estimated_speedup << "x\n";
    }
    std::cout << "\n";

    // 3. Validación
    std::cout << "Validating hints...\n";
    for (const auto& hint : hints) {
        auto validation = HintValidator::validate_hint(topology, hint);
        if (!validation.is_valid) {
            std::cout << "  ❌ " << to_string(hint.type) << ": " << validation.errors[0] << "\n";
        } else if (!validation.warnings.empty()) {
            std::cout << "  ⚠️  " << to_string(hint.type) << ": " << validation.warnings[0] << "\n";
        } else {
            std::cout << "  ✅ " << to_string(hint.type) << "\n";
        }
    }
    std::cout << "\n";

    // 4. Estrategia de optimización
    OptimizationStrategy strategy;
    strategy.level = OptimizationLevel::O3;
    strategy.platform = TargetPlatform::Desktop_x86;
    strategy.enable_auto_detection = true;
    strategy.enable_auto_application = true;
    strategy.allow_fast_math = true;
    strategy.allow_simd = true;

    // 5. Optimizar
    OptimizationOrchestrator orchestrator(strategy);
    Topology optimized = orchestrator.optimize(topology);

    // 6. Reporte
    std::string report = orchestrator.generate_report(topology, optimized);
    std::cout << report << "\n";

    // 7. Hints aplicados
    auto applied_hints = HintAnnotator::get_hints_by_priority(optimized, Priority::Suggestion);
    std::cout << "\nApplied " << applied_hints.size() << " hints\n";

    // 8. Ejemplo: Aplicación manual selectiva
    std::cout << "\n=== Manual Application Example ===\n";

    ApplicationConfig app_config;
    app_config.min_priority = Priority::High;
    app_config.available_features = {"SSE4.1", "AVX2"};

    HintApplicator applicator(app_config);

    for (const auto& hint : hints) {
        if (hint.type == HintType::Vectorize || hint.type == HintType::Inline) {
            auto result = applicator.apply_hint(topology, hint);
            if (result.applied) {
                std::cout << "✅ Applied: " << to_string(hint.type) << "\n";
                std::cout << "   " << result.message << "\n";
            }
        }
    }

    return 0;
}

Salida Esperada¶

Original topology: 5 nodes

Detected 4 optimization opportunities:
  - Inline @ gain1
    Inline simple operation to reduce function call overhead
    Speedup: 1.1x
  - Vectorize @ filter
    Apply SIMD vectorization to array operations
    Speedup: 4.0x
  - InPlace @ filter
    Process buffer in-place to save memory
    Speedup: 1.05x
  - Fuse @ gain1 filter
    Fuse adjacent nodes to reduce memory traffic
    Speedup: 1.2x

Validating hints...
  ✅ Inline
  ✅ Vectorize
  ✅ InPlace
  ✅ Fuse

Optimization Report
===================

Strategy: O3
Platform: Desktop x86

Applied Optimizations: 4
  - Inline (High)
    Speedup: 1.1x
  - Vectorize (High)
    Speedup: 4.0x
  - InPlace (Medium)
    Speedup: 1.05x
  - Fuse (Medium)
    Speedup: 1.2x

Applied 4 hints

=== Manual Application Example ===
✅ Applied: Vectorize
   Vectorization hint applied (SIMD enabled)
✅ Applied: Inline
   Inline hint applied (metadata updated)

📈 Métricas de Rendimiento¶

Estimaciones de Speedup¶

Optimización	Speedup Típico	Condiciones
Inline	1.05x - 1.2x	Nodos pequeños, llamadas frecuentes
Vectorize (SSE)	2x - 4x	Arrays ≥4 elementos, tipos float/int
Vectorize (AVX)	4x - 8x	Arrays ≥8 elementos, tipos float/int
InPlace	1.02x - 1.1x	Reduce tráfico de memoria
Fusion	1.1x - 1.5x	Elimina buffer intermedio
LookupTable	2x - 10x	Funciones transcendentales
Fast Math	1.1x - 1.3x	Operaciones FP
Loop Unroll	1.05x - 1.2x	Loops cortos

Impacto de Memoria¶

Optimización	Memory Impact	Notas
InPlace	-buffer_size	Elimina buffer de salida
Fusion	-intermediate_buffers	Reduce buffers intermedios
LookupTable	+1KB - 64KB	Tamaño de tabla
Preallocate	+total_buffers	Pre-asigna todo
Vectorize	0	Sin cambio

Compatibilidad de Plataformas¶

Plataforma	SIMD	Fast Math	LUT	GPU
Desktop x86	✅ SSE/AVX	✅	✅	✅
Desktop ARM	✅ NEON	✅	✅	❌
Mobile	⚠️ NEON	⚠️	✅	❌
WebAssembly	⚠️ SIMD128	✅	✅	❌
Embedded	❌	✅	⚠️	❌

🔗 Integración con Otros Subsistemas¶

Con Code Generation (05_05_06)¶

// Aplicar hints antes de generar código
auto optimized = orchestrator.optimize(topology);

CodeGenerationOptions gen_options;
gen_options.enable_simd = true;  // Usar hints SIMD
gen_options.enable_inlining = true;  // Usar hints inline

auto code = CodeGenerator::generate(optimized, analysis, buffer_plan, gen_options);

Con Buffer Management (05_05_03)¶

// Buffer manager usa hints InPlace/Reuse
auto hints = HintAnnotator::get_hints_by_type(topology, HintType::InPlace);
buffer_manager.set_inplace_hints(hints);

auto buffer_plan = buffer_manager.createPlan(topology, execution_order);

Con Composition Rules (05_05_07)¶

// Validar antes de optimizar
ValidationConfig val_config;
val_config.check_performance = true;
TopologyValidator validator(val_config);

auto report = validator.validate(topology);
if (report.passed) {
    auto optimized = orchestrator.optimize(topology);
}

🎯 Casos de Uso¶

1. Optimización Incremental (Editor)¶

// Durante edición, sugerir hints en tiempo real
void on_node_added(const Topology& topology, const NodeID& new_node) {
    HintDetector detector;
    auto hints = detector.analyze(topology);

    // Filtrar hints para el nuevo nodo
    for (const auto& hint : hints) {
        if (std::find(hint.target_nodes.begin(), hint.target_nodes.end(), new_node) !=
            hint.target_nodes.end()) {
            ui->show_optimization_suggestion(hint);
        }
    }
}

2. Optimización por Plataforma¶

// Generar variantes para múltiples plataformas
std::vector<TargetPlatform> platforms = {
    TargetPlatform::Desktop_x86,
    TargetPlatform::Mobile_iOS,
    TargetPlatform::WebAssembly
};

for (auto platform : platforms) {
    OptimizationStrategy strategy;
    strategy.level = OptimizationLevel::O2;
    strategy.platform = platform;

    OptimizationOrchestrator orchestrator(strategy);
    auto optimized = orchestrator.optimize(topology);

    // Generar código para plataforma
    generate_code_for_platform(optimized, platform);
}

3. A/B Testing de Optimizaciones¶

// Comparar rendimiento con/sin optimizaciones
Topology baseline = topology;
Topology optimized_o2 = optimize_at_level(topology, OptimizationLevel::O2);
Topology optimized_o3 = optimize_at_level(topology, OptimizationLevel::O3);

benchmark(baseline);      // 100ms
benchmark(optimized_o2);  // 65ms (1.54x)
benchmark(optimized_o3);  // 42ms (2.38x)

🚀 Estado del Sistema¶

✅ Sistema de hints: 21 tipos de hints (performance, memory, computation, structural, platform)
✅ Detector automático: 6 patrones built-in + API para custom patterns
✅ Validador: Validación de hints, detección de conflictos, verificación de requisitos
✅ Aplicador: Aplicación individual/batch, por prioridad, con config
✅ Orquestador: Pipeline completo con estrategias O0-Ofast, 8 plataformas target
✅ Reportes: Generación de reportes de optimización con métricas

📚 Referencias¶

Compiler Optimizations: Muchnick, S. "Advanced Compiler Design and Implementation"
SIMD Programming: Intel Intrinsics Guide, ARM NEON Programming Guide
DSP Optimization: Smith, J. O. "Mathematics of the DFT"
Performance Analysis: Fog, A. "Optimizing Software in C++"

Subsistema: 05_MODULES → 05_05_TOPOLOGY_DESIGN → 05_05_08_optimization_hints Autor: AudioLab Development Team Versión: 1.0.0 Última actualización: 2025-10-10