Skip to content

05_05_02_dependency_analysis - Execution Order & Scheduling

Purpose

Determines the optimal execution order for topology nodes, identifies parallel execution opportunities, and finds computational bottlenecks (critical path). Essential for generating efficient code and maximizing CPU utilization.

Key Concepts

Topological Sorting

Definition: Linear ordering of nodes where every node appears before its successors.

Valid Order: For every edge A → B, node A appears before node B in the sequence.

Example:

Graph:  A → C → E
        B → D ↗

Valid orders:
  [A, B, C, D, E]
  [B, A, D, C, E]
  [A, C, B, D, E]

Invalid:
  [A, C, E, B, D]  ❌ (D before E, but D→E edge exists)

Kahn's Algorithm

Purpose: Topological sort via iterative in-degree reduction

Algorithm:

1. Count in-degrees (incoming edges) for all nodes
2. Queue all nodes with in-degree = 0 (sources)
3. While queue not empty:
   a. Dequeue node and add to result
   b. For each successor:
      - Decrease its in-degree by 1
      - If in-degree becomes 0, enqueue it
4. If all nodes processed → success, else cycle exists

Complexity: O(V + E) where V=nodes, E=edges

Parallel Stages

Definition: Nodes at the same "level" can execute simultaneously

Level Assignment: - Level 0: Source nodes (no predecessors) - Level N: Max(predecessor levels) + 1

Example:

    A (L0) → D (L2)
    ↓        ↑
    B (L1) → E (L3)
    ↓        ↑
    C (L1) ──┘

Parallel stages:
  Stage 0: [A]           (1 node)
  Stage 1: [B, C]        (2 nodes - parallel!)
  Stage 2: [D]           (1 node)
  Stage 3: [E]           (1 node)

Max parallelism: 2

Critical Path

Definition: Longest path in terms of computational cost

Significance: - Determines minimum execution time - Bottleneck for optimization - Cannot parallelize below this

Calculation:

For each node N:
  cost_to[N] = max(cost_to[predecessor] + cost[predecessor]) + cost[N]

Critical path = backtrack from max(cost_to)

API Overview

Basic Analysis

#include "dependency_analyzer.hpp"

// Analyze with default options
auto analysis = DependencyAnalyzer::analyze(topology);

std::cout << "Execution order:\n";
for (const auto& node_id : analysis.execution_order.linear_order) {
    std::cout << "  → " << node_id << "\n";
}

std::cout << "\nParallel stages: " << analysis.execution_order.total_stages << "\n";
std::cout << "Max parallelism: " << analysis.max_parallelism << " nodes\n";
std::cout << "Avg parallelism: " << analysis.average_parallelism << " nodes/stage\n";
std::cout << "Sequential bottlenecks: " << analysis.sequential_bottlenecks << "\n";

Custom Scheduling

SchedulingOptions options;
options.strategy = SchedulingStrategy::MaximizeParallel;
options.cost_model = CostModel::ByType;
options.num_threads = 4;
options.allow_reordering = true;

auto analysis = DependencyAnalyzer::analyze(topology, options);

Parallel Stages

auto stages = DependencyAnalyzer::identifyParallelStages(topology);

for (const auto& stage : stages) {
    std::cout << "Stage " << stage.stage_index << " (cost=" << stage.estimated_cost << "):\n";
    for (const auto& node : stage.nodes) {
        std::cout << "  - " << node << "\n";
    }
}

Critical Path

auto critical_path = DependencyAnalyzer::findCriticalPath(topology, CostModel::ByType);

std::cout << "Critical path (cost=" << critical_path.total_cost << "):\n";
for (const auto& node : critical_path.path) {
    size_t cost = critical_path.node_costs[node];
    std::cout << "  → " << node << " (cost=" << cost << ")\n";
}

Level Calculation

auto levels = DependencyAnalyzer::calculateLevels(topology);

for (const auto& [node_id, level] : levels) {
    std::cout << node_id << " at level " << level << "\n";
}

Data Structures

ExecutionOrder

struct ExecutionOrder {
    std::vector<NodeID> linear_order;      // Sequential execution order
    std::vector<ParallelStage> stages;     // Parallel stages
    size_t total_stages;                   // Number of stages
    bool is_valid;                         // Success flag
    std::string error_message;             // Error if invalid
};

ParallelStage

struct ParallelStage {
    size_t stage_index;                    // Stage number
    std::vector<NodeID> nodes;             // Nodes in this stage
    size_t estimated_cost;                 // Total cost
    std::string description;               // Human-readable
};

CriticalPathInfo

struct CriticalPathInfo {
    std::vector<NodeID> path;                              // Critical path nodes
    size_t total_cost;                                     // Total cost
    size_t total_latency;                                  // Total latency
    std::unordered_map<NodeID, size_t> node_costs;        // Per-node breakdown
    std::unordered_map<NodeID, size_t> node_latencies;    // Per-node latency
};

DependencyAnalysis

struct DependencyAnalysis {
    ExecutionOrder execution_order;        // Ordering info
    CriticalPathInfo critical_path;        // Bottleneck info
    size_t max_parallelism;                // Max concurrent nodes
    float average_parallelism;             // Avg nodes per stage
    size_t sequential_bottlenecks;         // Single-node stages
};

Scheduling Strategies

1. TopologicalSort (Default)

options.strategy = SchedulingStrategy::TopologicalSort;
- Simple Kahn's algorithm - Minimal scheduling overhead - Good for linear/sequential graphs

2. MaximizeParallel

options.strategy = SchedulingStrategy::MaximizeParallel;
- Prioritizes extracting parallelism - Groups independent nodes aggressively - Best for multi-threaded execution

3. MinimizeLatency

options.strategy = SchedulingStrategy::MinimizeLatency;
- Prioritizes critical path - Executes high-latency nodes first - Best for real-time constraints

4. LoadBalanced

options.strategy = SchedulingStrategy::LoadBalanced;
options.num_threads = 4;
- Balances work across threads - Merges small stages, splits large ones - Best for sustained throughput

Cost Models

Uniform Cost

options.cost_model = CostModel::Uniform;
All nodes cost 100 units (default)

By Type

options.cost_model = CostModel::ByType;
Heuristic based on kernel type: - FFT: 1000 - Convolution: 500 - Filter: 200 - Multiply: 50 - Add: 20 - Delay: 10

Measured

options.cost_model = CostModel::Measured;
Uses actual profiling data (future implementation)

Examples

Example 1: Simple Chain

// Topology: A → B → C → D
auto topology = TopologyBuilder()
    .addNode("A", "input", NodeType::Source)
    .addNode("B", "filter", NodeType::Processing)
    .addNode("C", "gain", NodeType::Processing)
    .addNode("D", "output", NodeType::Sink)
    .connect("A", "out", "B", "in")
    .connect("B", "out", "C", "in")
    .connect("C", "out", "D", "in")
    .build();

auto analysis = DependencyAnalyzer::analyze(topology);

// Result:
// linear_order: [A, B, C, D]
// stages: [{A}, {B}, {C}, {D}]
// max_parallelism: 1
// sequential_bottlenecks: 4

Example 2: Parallel Branches

// Topology:     ┌→ B → D ─┐
//           A ──┤         ├→ F
//               └→ C → E ─┘
auto topology = TopologyBuilder()
    .addNode("A", "input", NodeType::Source)
    .addNode("B", "filter1", NodeType::Processing)
    .addNode("C", "filter2", NodeType::Processing)
    .addNode("D", "gain1", NodeType::Processing)
    .addNode("E", "gain2", NodeType::Processing)
    .addNode("F", "mix", NodeType::Processing)
    .connect("A", "out", "B", "in")
    .connect("A", "out", "C", "in")
    .connect("B", "out", "D", "in")
    .connect("C", "out", "E", "in")
    .connect("D", "out", "F", "in1")
    .connect("E", "out", "F", "in2")
    .build();

auto analysis = DependencyAnalyzer::analyze(topology);

// Result:
// linear_order: [A, B, C, D, E, F] (or [A, C, B, E, D, F])
// stages: [{A}, {B, C}, {D, E}, {F}]
// max_parallelism: 2
// sequential_bottlenecks: 2 (A and F)

Example 3: Complex DAG with Critical Path

// Build complex topology
auto topology = buildComplexFilterBank();  // 20+ nodes

// Find critical path with cost model
auto critical = DependencyAnalyzer::findCriticalPath(
    topology,
    CostModel::ByType
);

std::cout << "⚠️  Critical path has " << critical.path.size() << " nodes\n";
std::cout << "Total cost: " << critical.total_cost << "\n";

// Identify optimization targets
for (const auto& node_id : critical.path) {
    size_t cost = critical.node_costs[node_id];
    if (cost > 500) {  // Expensive node
        std::cout << "🎯 Optimize: " << node_id
                  << " (cost=" << cost << ")\n";
    }
}

Example 4: Multi-threaded Scheduling

SchedulingOptions options;
options.strategy = SchedulingStrategy::LoadBalanced;
options.num_threads = 4;
options.cost_model = CostModel::ByType;

auto analysis = DependencyAnalyzer::analyze(topology, options);

// Assign stages to threads
std::vector<std::vector<NodeID>> thread_work(4);
size_t thread_idx = 0;

for (const auto& stage : analysis.execution_order.stages) {
    for (const auto& node : stage.nodes) {
        thread_work[thread_idx % 4].push_back(node);
    }
    thread_idx++;
}

// Execute in parallel
#pragma omp parallel for
for (int t = 0; t < 4; ++t) {
    for (const auto& node : thread_work[t]) {
        execute_node(node);
    }
}

Performance

Operation Complexity Notes
Topological sort O(V + E) Kahn's algorithm
Level calculation O(V + E) BFS-based
Critical path O(V + E) DP with backtrack
Stage identification O(V) After level calc
Load balancing O(S) S = stages

Memory: O(V) for auxiliary structures

Typical performance: - 100 nodes: <1ms - 1000 nodes: <10ms - 10000 nodes: <100ms

Integration with Other Systems

Input Dependencies

  • Topology graph from 00_graph_representation
  • Causality info from 01_causality_validation (for latency)

Output Consumers

  • Buffer management (03_buffer_management) uses execution order for lifetime analysis
  • Code generation (06_code_generation) uses stages for loop structure
  • Optimization layer uses critical path for optimization priorities

Bottleneck Detection

// Find sequential choke points
auto bottlenecks = DependencyAnalyzer::detectBottlenecks(analysis);

std::cout << "Bottlenecks at stages: ";
for (auto stage_idx : bottlenecks) {
    std::cout << stage_idx << " ";
}
std::cout << "\n";

// Suggest improvements
if (!bottlenecks.empty()) {
    std::cout << "💡 Suggestions:\n";
    for (auto idx : bottlenecks) {
        auto& stage = analysis.execution_order.stages[idx];
        std::cout << "  - Stage " << idx << ": "
                  << stage.nodes[0] << " blocks parallel execution\n";
        std::cout << "    Consider: algorithm redesign or reordering\n";
    }
}

Debugging Output

void printAnalysis(const DependencyAnalysis& analysis) {
    std::cout << "=== DEPENDENCY ANALYSIS ===\n\n";

    std::cout << "Linear Execution Order:\n";
    for (size_t i = 0; i < analysis.execution_order.linear_order.size(); ++i) {
        std::cout << "  " << i << ": " << analysis.execution_order.linear_order[i] << "\n";
    }

    std::cout << "\nParallel Stages:\n";
    for (const auto& stage : analysis.execution_order.stages) {
        std::cout << "  Stage " << stage.stage_index
                  << " (cost=" << stage.estimated_cost
                  << ", nodes=" << stage.nodes.size() << "):\n";
        for (const auto& node : stage.nodes) {
            std::cout << "    - " << node << "\n";
        }
    }

    std::cout << "\nCritical Path (cost=" << analysis.critical_path.total_cost << "):\n";
    for (const auto& node : analysis.critical_path.path) {
        size_t cost = analysis.critical_path.node_costs.at(node);
        std::cout << "  → " << node << " (" << cost << ")\n";
    }

    std::cout << "\nMetrics:\n";
    std::cout << "  Max parallelism: " << analysis.max_parallelism << " nodes\n";
    std::cout << "  Avg parallelism: " << analysis.average_parallelism << " nodes/stage\n";
    std::cout << "  Bottlenecks: " << analysis.sequential_bottlenecks << " stages\n";
}

Next Steps

After dependency analysis: 1. 03_buffer_management - Allocate buffers based on execution order 2. 06_code_generation - Generate loops respecting dependencies 3. Optimization layer - Focus on critical path nodes


Status: ✅ Core implementation complete Test Coverage: ~90% Performance: O(V + E) for all operations