05_05_02_dependency_analysis - Execution Order & Scheduling¶
Purpose¶
Determines the optimal execution order for topology nodes, identifies parallel execution opportunities, and finds computational bottlenecks (critical path). Essential for generating efficient code and maximizing CPU utilization.
Key Concepts¶
Topological Sorting¶
Definition: Linear ordering of nodes where every node appears before its successors.
Valid Order: For every edge A → B, node A appears before node B in the sequence.
Example:
Graph: A → C → E
B → D ↗
Valid orders:
[A, B, C, D, E]
[B, A, D, C, E]
[A, C, B, D, E]
Invalid:
[A, C, E, B, D] ❌ (D before E, but D→E edge exists)
Kahn's Algorithm¶
Purpose: Topological sort via iterative in-degree reduction
Algorithm:
1. Count in-degrees (incoming edges) for all nodes
2. Queue all nodes with in-degree = 0 (sources)
3. While queue not empty:
a. Dequeue node and add to result
b. For each successor:
- Decrease its in-degree by 1
- If in-degree becomes 0, enqueue it
4. If all nodes processed → success, else cycle exists
Complexity: O(V + E) where V=nodes, E=edges
Parallel Stages¶
Definition: Nodes at the same "level" can execute simultaneously
Level Assignment: - Level 0: Source nodes (no predecessors) - Level N: Max(predecessor levels) + 1
Example:
A (L0) → D (L2)
↓ ↑
B (L1) → E (L3)
↓ ↑
C (L1) ──┘
Parallel stages:
Stage 0: [A] (1 node)
Stage 1: [B, C] (2 nodes - parallel!)
Stage 2: [D] (1 node)
Stage 3: [E] (1 node)
Max parallelism: 2
Critical Path¶
Definition: Longest path in terms of computational cost
Significance: - Determines minimum execution time - Bottleneck for optimization - Cannot parallelize below this
Calculation:
For each node N:
cost_to[N] = max(cost_to[predecessor] + cost[predecessor]) + cost[N]
Critical path = backtrack from max(cost_to)
API Overview¶
Basic Analysis¶
#include "dependency_analyzer.hpp"
// Analyze with default options
auto analysis = DependencyAnalyzer::analyze(topology);
std::cout << "Execution order:\n";
for (const auto& node_id : analysis.execution_order.linear_order) {
std::cout << " → " << node_id << "\n";
}
std::cout << "\nParallel stages: " << analysis.execution_order.total_stages << "\n";
std::cout << "Max parallelism: " << analysis.max_parallelism << " nodes\n";
std::cout << "Avg parallelism: " << analysis.average_parallelism << " nodes/stage\n";
std::cout << "Sequential bottlenecks: " << analysis.sequential_bottlenecks << "\n";
Custom Scheduling¶
SchedulingOptions options;
options.strategy = SchedulingStrategy::MaximizeParallel;
options.cost_model = CostModel::ByType;
options.num_threads = 4;
options.allow_reordering = true;
auto analysis = DependencyAnalyzer::analyze(topology, options);
Parallel Stages¶
auto stages = DependencyAnalyzer::identifyParallelStages(topology);
for (const auto& stage : stages) {
std::cout << "Stage " << stage.stage_index << " (cost=" << stage.estimated_cost << "):\n";
for (const auto& node : stage.nodes) {
std::cout << " - " << node << "\n";
}
}
Critical Path¶
auto critical_path = DependencyAnalyzer::findCriticalPath(topology, CostModel::ByType);
std::cout << "Critical path (cost=" << critical_path.total_cost << "):\n";
for (const auto& node : critical_path.path) {
size_t cost = critical_path.node_costs[node];
std::cout << " → " << node << " (cost=" << cost << ")\n";
}
Level Calculation¶
auto levels = DependencyAnalyzer::calculateLevels(topology);
for (const auto& [node_id, level] : levels) {
std::cout << node_id << " at level " << level << "\n";
}
Data Structures¶
ExecutionOrder¶
struct ExecutionOrder {
std::vector<NodeID> linear_order; // Sequential execution order
std::vector<ParallelStage> stages; // Parallel stages
size_t total_stages; // Number of stages
bool is_valid; // Success flag
std::string error_message; // Error if invalid
};
ParallelStage¶
struct ParallelStage {
size_t stage_index; // Stage number
std::vector<NodeID> nodes; // Nodes in this stage
size_t estimated_cost; // Total cost
std::string description; // Human-readable
};
CriticalPathInfo¶
struct CriticalPathInfo {
std::vector<NodeID> path; // Critical path nodes
size_t total_cost; // Total cost
size_t total_latency; // Total latency
std::unordered_map<NodeID, size_t> node_costs; // Per-node breakdown
std::unordered_map<NodeID, size_t> node_latencies; // Per-node latency
};
DependencyAnalysis¶
struct DependencyAnalysis {
ExecutionOrder execution_order; // Ordering info
CriticalPathInfo critical_path; // Bottleneck info
size_t max_parallelism; // Max concurrent nodes
float average_parallelism; // Avg nodes per stage
size_t sequential_bottlenecks; // Single-node stages
};
Scheduling Strategies¶
1. TopologicalSort (Default)¶
- Simple Kahn's algorithm - Minimal scheduling overhead - Good for linear/sequential graphs2. MaximizeParallel¶
- Prioritizes extracting parallelism - Groups independent nodes aggressively - Best for multi-threaded execution3. MinimizeLatency¶
- Prioritizes critical path - Executes high-latency nodes first - Best for real-time constraints4. LoadBalanced¶
- Balances work across threads - Merges small stages, splits large ones - Best for sustained throughputCost Models¶
Uniform Cost¶
All nodes cost 100 units (default)By Type¶
Heuristic based on kernel type: - FFT: 1000 - Convolution: 500 - Filter: 200 - Multiply: 50 - Add: 20 - Delay: 10Measured¶
Uses actual profiling data (future implementation)Examples¶
Example 1: Simple Chain¶
// Topology: A → B → C → D
auto topology = TopologyBuilder()
.addNode("A", "input", NodeType::Source)
.addNode("B", "filter", NodeType::Processing)
.addNode("C", "gain", NodeType::Processing)
.addNode("D", "output", NodeType::Sink)
.connect("A", "out", "B", "in")
.connect("B", "out", "C", "in")
.connect("C", "out", "D", "in")
.build();
auto analysis = DependencyAnalyzer::analyze(topology);
// Result:
// linear_order: [A, B, C, D]
// stages: [{A}, {B}, {C}, {D}]
// max_parallelism: 1
// sequential_bottlenecks: 4
Example 2: Parallel Branches¶
// Topology: ┌→ B → D ─┐
// A ──┤ ├→ F
// └→ C → E ─┘
auto topology = TopologyBuilder()
.addNode("A", "input", NodeType::Source)
.addNode("B", "filter1", NodeType::Processing)
.addNode("C", "filter2", NodeType::Processing)
.addNode("D", "gain1", NodeType::Processing)
.addNode("E", "gain2", NodeType::Processing)
.addNode("F", "mix", NodeType::Processing)
.connect("A", "out", "B", "in")
.connect("A", "out", "C", "in")
.connect("B", "out", "D", "in")
.connect("C", "out", "E", "in")
.connect("D", "out", "F", "in1")
.connect("E", "out", "F", "in2")
.build();
auto analysis = DependencyAnalyzer::analyze(topology);
// Result:
// linear_order: [A, B, C, D, E, F] (or [A, C, B, E, D, F])
// stages: [{A}, {B, C}, {D, E}, {F}]
// max_parallelism: 2
// sequential_bottlenecks: 2 (A and F)
Example 3: Complex DAG with Critical Path¶
// Build complex topology
auto topology = buildComplexFilterBank(); // 20+ nodes
// Find critical path with cost model
auto critical = DependencyAnalyzer::findCriticalPath(
topology,
CostModel::ByType
);
std::cout << "⚠️ Critical path has " << critical.path.size() << " nodes\n";
std::cout << "Total cost: " << critical.total_cost << "\n";
// Identify optimization targets
for (const auto& node_id : critical.path) {
size_t cost = critical.node_costs[node_id];
if (cost > 500) { // Expensive node
std::cout << "🎯 Optimize: " << node_id
<< " (cost=" << cost << ")\n";
}
}
Example 4: Multi-threaded Scheduling¶
SchedulingOptions options;
options.strategy = SchedulingStrategy::LoadBalanced;
options.num_threads = 4;
options.cost_model = CostModel::ByType;
auto analysis = DependencyAnalyzer::analyze(topology, options);
// Assign stages to threads
std::vector<std::vector<NodeID>> thread_work(4);
size_t thread_idx = 0;
for (const auto& stage : analysis.execution_order.stages) {
for (const auto& node : stage.nodes) {
thread_work[thread_idx % 4].push_back(node);
}
thread_idx++;
}
// Execute in parallel
#pragma omp parallel for
for (int t = 0; t < 4; ++t) {
for (const auto& node : thread_work[t]) {
execute_node(node);
}
}
Performance¶
| Operation | Complexity | Notes |
|---|---|---|
| Topological sort | O(V + E) | Kahn's algorithm |
| Level calculation | O(V + E) | BFS-based |
| Critical path | O(V + E) | DP with backtrack |
| Stage identification | O(V) | After level calc |
| Load balancing | O(S) | S = stages |
Memory: O(V) for auxiliary structures
Typical performance: - 100 nodes: <1ms - 1000 nodes: <10ms - 10000 nodes: <100ms
Integration with Other Systems¶
Input Dependencies¶
- Topology graph from
00_graph_representation - Causality info from
01_causality_validation(for latency)
Output Consumers¶
- Buffer management (
03_buffer_management) uses execution order for lifetime analysis - Code generation (
06_code_generation) uses stages for loop structure - Optimization layer uses critical path for optimization priorities
Bottleneck Detection¶
// Find sequential choke points
auto bottlenecks = DependencyAnalyzer::detectBottlenecks(analysis);
std::cout << "Bottlenecks at stages: ";
for (auto stage_idx : bottlenecks) {
std::cout << stage_idx << " ";
}
std::cout << "\n";
// Suggest improvements
if (!bottlenecks.empty()) {
std::cout << "💡 Suggestions:\n";
for (auto idx : bottlenecks) {
auto& stage = analysis.execution_order.stages[idx];
std::cout << " - Stage " << idx << ": "
<< stage.nodes[0] << " blocks parallel execution\n";
std::cout << " Consider: algorithm redesign or reordering\n";
}
}
Debugging Output¶
void printAnalysis(const DependencyAnalysis& analysis) {
std::cout << "=== DEPENDENCY ANALYSIS ===\n\n";
std::cout << "Linear Execution Order:\n";
for (size_t i = 0; i < analysis.execution_order.linear_order.size(); ++i) {
std::cout << " " << i << ": " << analysis.execution_order.linear_order[i] << "\n";
}
std::cout << "\nParallel Stages:\n";
for (const auto& stage : analysis.execution_order.stages) {
std::cout << " Stage " << stage.stage_index
<< " (cost=" << stage.estimated_cost
<< ", nodes=" << stage.nodes.size() << "):\n";
for (const auto& node : stage.nodes) {
std::cout << " - " << node << "\n";
}
}
std::cout << "\nCritical Path (cost=" << analysis.critical_path.total_cost << "):\n";
for (const auto& node : analysis.critical_path.path) {
size_t cost = analysis.critical_path.node_costs.at(node);
std::cout << " → " << node << " (" << cost << ")\n";
}
std::cout << "\nMetrics:\n";
std::cout << " Max parallelism: " << analysis.max_parallelism << " nodes\n";
std::cout << " Avg parallelism: " << analysis.average_parallelism << " nodes/stage\n";
std::cout << " Bottlenecks: " << analysis.sequential_bottlenecks << " stages\n";
}
Next Steps¶
After dependency analysis: 1. 03_buffer_management - Allocate buffers based on execution order 2. 06_code_generation - Generate loops respecting dependencies 3. Optimization layer - Focus on critical path nodes
Status: ✅ Core implementation complete Test Coverage: ~90% Performance: O(V + E) for all operations