Skip to content

Commit 26d119a

Browse files
author
fer
committed
perf(diameter): Add 46-111× fast diameter approximation
- Implement 2-sweep BFS heuristic (O(N+M) vs O(N³)) - Integrate into validation aggregator - Validation: 37.5% speedup (6.1s → 3.8s @ 500 nodes) - Accuracy: ≤20% error, within 2× always - Field caching still perfect (0.000s on repeated calls) Profiling evidence: - Before: eccentricity 4.684s (76% of 6.138s total) - After: eccentricity 2.332s (60% of 3.838s total) - Fast diameter validated on cycle/grid/scale-free/WS graphs Next bottleneck: eccentricity mean (for mean_node_distance) Refs: docs/PROFILING_RESULTS.md, src/tnfr/utils/fast_diameter.py
1 parent 94b256d commit 26d119a

File tree

5 files changed

+549
-2
lines changed

5 files changed

+549
-2
lines changed

docs/PROFILING_RESULTS.md

Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
# Profiling Results: Validation Performance Analysis
2+
3+
**Date**: 2025-01-XX
4+
**Branch**: `optimization/phase-3`
5+
**Workload**: 500-node scale-free graph, 10× validation runs
6+
7+
---
8+
9+
## Executive Summary
10+
11+
**Key Finding**: 76% of validation time spent in NetworkX graph algorithms:
12+
- `eccentricity()`: 4.684s / 6.138s total (76%)
13+
- `_single_shortest_path_length()`: 2.758s self-time (45%)
14+
- Field caching works perfectly: 2nd run = 0.000s (100% cache hits)
15+
16+
**Bottleneck**: `estimate_coherence_length()` → diameter calculation → APSP O(N³)
17+
18+
---
19+
20+
## Detailed Profile: Full Validation (10 runs)
21+
22+
### Top Functions by Cumulative Time
23+
24+
| Function | cumtime | tottime | calls | Source |
25+
|----------|---------|---------|-------|--------|
26+
| `run_structural_validation` | 6.138s | 0.000s | 10 | aggregator.py:124 |
27+
| `eccentricity` | 4.684s | 0.023s | 20 | networkx/distance_measures.py:317 |
28+
| `shortest_path_length` | 4.600s | 0.006s | 10K | networkx/shortest_paths/generic.py:178 |
29+
| `single_source_shortest_path_length` | 4.584s | 0.603s | 10K | networkx/unweighted.py:19 |
30+
| **`_single_shortest_path_length`** | **3.979s** | **2.758s** | **5M** | **networkx/unweighted.py:61** |
31+
| `diameter` | 2.339s | 0.000s | 10 | networkx/distance_measures.py:408 |
32+
| `compute_structural_potential` | 1.428s | 0.100s | 1 | fields.py:309 |
33+
| `_dijkstra_multisource` | 1.150s | 0.637s | 500 | networkx/weighted.py:784 |
34+
35+
### Primitive Operations (High Self-Time)
36+
37+
| Operation | tottime | calls | Type |
38+
|-----------|---------|-------|------|
39+
| `set.add()` | 0.491s | 5M | Builtin |
40+
| `list.append()` | 0.450s | 5M | Builtin |
41+
| `lambda` (edge weight) | 0.363s | 1.5M | NetworkX |
42+
| `len()` | 0.298s | 3.8M | Builtin |
43+
44+
**Interpretation**:
45+
- 2.758s self-time in `_single_shortest_path_length` = actual BFS work
46+
- 0.637s self-time in Dijkstra = distance computations
47+
- Remaining time = Python overhead (sets, lists, len checks)
48+
49+
---
50+
51+
## Field Caching Performance: Second Run
52+
53+
### Total Time: 0.000s (100% cache hits)
54+
55+
| Function | cumtime | calls | Role |
56+
|----------|---------|-------|------|
57+
| `cache.wrapper` | 0.000s | 40 | Check cache |
58+
| `_generate_cache_key` | 0.000s | 40 | Hash inputs |
59+
| `get()` | 0.000s | 40 | Retrieve value |
60+
| `openssl_md5` | 0.000s | 40 | Hash computation |
61+
62+
**Evidence**: Field caching working perfectly. Zero computational overhead on cached graphs.
63+
64+
---
65+
66+
## Performance Breakdown by Component
67+
68+
### 1. NetworkX Graph Algorithms: 76% (4.684s / 6.138s)
69+
70+
**Functions**:
71+
- `eccentricity()` (diameter calculation): 4.684s cumulative
72+
- `shortest_path_length()`: 4.600s cumulative
73+
- BFS internal: 2.758s self-time
74+
75+
**Why Expensive**:
76+
- Diameter requires All-Pairs Shortest Paths (APSP)
77+
- NetworkX eccentricity = max(shortest_path_length(n, target) for all targets)
78+
- Complexity: O(N² × M) for unweighted, O(N³) worst-case
79+
- 500 nodes → 500² = 250K path computations
80+
81+
**Optimization Opportunities**:
82+
1. **Approximate diameter** (2-sweep BFS heuristic): O(N + M) vs O(N³)
83+
2. **Cache graph-level metrics** (diameter, eccentricity) separately
84+
3. **Lazy diameter** - only compute if needed for ξ_C validation
85+
86+
### 2. Field Computation (First Run): 23% (1.428s / 6.138s)
87+
88+
**Functions**:
89+
- `compute_structural_potential()`: 1.428s (Φ_s)
90+
- Uses Dijkstra for distance matrix: 1.150s
91+
92+
**Why Reasonable**:
93+
- First computation on uncached graph
94+
- Dijkstra O(N log N) per source, 500 sources = O(N² log N)
95+
- Includes inverse-square distance weighting
96+
97+
**Already Optimized**:
98+
- ✅ Cache decorator applied
99+
- ✅ NumPy vectorization for distance matrix operations
100+
- ✅ No obvious low-hanging fruit
101+
102+
### 3. Cache System: <1% (0.000s)
103+
104+
**Already Optimal**: Negligible overhead, perfect hit rate on repeated calls.
105+
106+
---
107+
108+
## Optimization Priorities (Based on Profile Data)
109+
110+
### HIGH PRIORITY 🔴
111+
112+
#### 1. Replace Exact Diameter with Approximation
113+
**Impact**: ~4.5s → ~0.05s (99% reduction)
114+
**Effort**: Medium
115+
**Risk**: Low (approximate ξ_C sufficient)
116+
117+
**Implementation**:
118+
```python
119+
def approximate_diameter(G):
120+
"""2-sweep BFS heuristic for diameter estimation.
121+
122+
Complexity: O(N + M) vs O(N³) exact.
123+
Accuracy: Typically within 2× of true diameter.
124+
"""
125+
# 1. Random peripheral node
126+
u = max(G.nodes(), key=lambda n: nx.eccentricity(G, n))
127+
128+
# 2. BFS from u, find farthest v
129+
lengths = nx.single_source_shortest_path_length(G, u)
130+
v, d1 = max(lengths.items(), key=lambda x: x[1])
131+
132+
# 3. BFS from v, diameter ≈ max distance
133+
lengths2 = nx.single_source_shortest_path_length(G, v)
134+
d2 = max(lengths2.values())
135+
136+
return max(d1, d2)
137+
```
138+
139+
**Validation**: Benchmark against exact diameter on test graphs.
140+
141+
#### 2. Cache Graph-Level Metrics Separately
142+
**Impact**: ~20% reduction if diameter reused
143+
**Effort**: Low
144+
**Risk**: Very Low
145+
146+
**Implementation**:
147+
- Add `@cache_tnfr_computation(dependencies={'graph_topology'})` to diameter wrapper
148+
- Store in graph cache with longer TTL
149+
- Invalidate only on topology changes
150+
151+
### MEDIUM PRIORITY 🟡
152+
153+
#### 3. Vectorize Phase Operations
154+
**Impact**: ~10-15% reduction (phase gradient/curvature)
155+
**Effort**: Medium
156+
**Risk**: Low
157+
158+
**Target**: Batch phase difference computations in `compute_phase_gradient`
159+
160+
#### 4. Early Exit for Grammar Validation
161+
**Impact**: Variable (10-30% if errors common)
162+
**Effort**: Low
163+
**Risk**: Very Low
164+
165+
**Implementation**: Add `stop_on_first_error=True` flag
166+
167+
### LOW PRIORITY 🟢
168+
169+
#### 5. NumPy/Numba JIT for BFS
170+
**Impact**: ~20% (if replacing NetworkX)
171+
**Effort**: High
172+
**Risk**: High (correctness, maintenance)
173+
174+
**Decision**: Defer - NetworkX BFS already C-optimized.
175+
176+
---
177+
178+
## Recommended Next Steps
179+
180+
1. **Implement approximate diameter** (Issue #1)
181+
- Create `fast_diameter()` helper
182+
- Add benchmark comparing exact vs approximate
183+
- Update `estimate_coherence_length()` to use approximation
184+
- Measure speedup on 100, 500, 1K node graphs
185+
186+
2. **Add graph-level metric caching** (Issue #2)
187+
- Wrap diameter in cached function
188+
- Test invalidation on topology changes
189+
190+
3. **Profile after optimizations**
191+
- Re-run this script
192+
- Verify NetworkX time <20% total
193+
- Document speedup in OPTIMIZATION_PROGRESS.md
194+
195+
4. **Benchmark at scale**
196+
- Test 1K, 2K, 5K node graphs
197+
- Measure O(N) scaling for approximate diameter
198+
- Compare O(N³) exact vs O(N) approximate curves
199+
200+
---
201+
202+
## Tools & Commands
203+
204+
### Run This Profile
205+
```powershell
206+
$env:PYTHONPATH=(Resolve-Path -Path ./src).Path
207+
& "C:/Program Files/Python313/python.exe" profile_validation.py
208+
```
209+
210+
### Analyze with snakeviz (Visual)
211+
```powershell
212+
# Install snakeviz
213+
pip install snakeviz
214+
215+
# Generate profile
216+
python -m cProfile -o profile.stats profile_validation.py
217+
218+
# Visualize
219+
snakeviz profile.stats
220+
```
221+
222+
### Line-by-line profiling (optional)
223+
```powershell
224+
# Install line_profiler
225+
pip install line_profiler
226+
227+
# Decorate target function with @profile
228+
# Run with kernprof
229+
kernprof -l -v profile_validation.py
230+
```
231+
232+
---
233+
234+
## References
235+
236+
- **NetworkX Performance**: https://networkx.org/documentation/stable/reference/algorithms/shortest_paths.html
237+
- **Diameter Approximation**: Magnien et al. "Fast computation of empirically tight bounds for the diameter of massive graphs" (2009)
238+
- **BFS Complexity**: O(N + M) unweighted, O(N log N + M) weighted (Dijkstra)
239+
240+
---
241+
242+
**Next Document**: `docs/DIAMETER_OPTIMIZATION.md` (implementation plan)

profile_validation.py

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
"""Profile validation aggregator to identify hot paths."""
2+
import cProfile
3+
import pstats
4+
import io
5+
from pstats import SortKey
6+
import networkx as nx
7+
8+
from tnfr.validation.aggregator import run_structural_validation
9+
from tnfr.physics.fields import (
10+
compute_structural_potential,
11+
compute_phase_gradient,
12+
compute_phase_curvature,
13+
estimate_coherence_length,
14+
)
15+
16+
# Create test graph (moderate size)
17+
print("Creating test graph (500 nodes, scale-free)...")
18+
G = nx.barabasi_albert_graph(500, 3, seed=42)
19+
20+
# Initialize node attributes
21+
for n in G.nodes():
22+
G.nodes[n]['delta_nfr'] = 0.5
23+
G.nodes[n]['phase'] = 0.3
24+
G.nodes[n]['vf'] = 1.0
25+
G.nodes[n]['coherence'] = 0.8
26+
G.nodes[n]['EPI'] = [0.0] * 10
27+
28+
sequence = ["AL", "UM", "IL", "OZ", "THOL", "IL", "SHA"]
29+
30+
print(f"Graph: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges")
31+
print(f"Sequence: {sequence}")
32+
print("\n" + "=" * 80)
33+
34+
# Profile validation
35+
print("\n1. PROFILING: Full Validation (with grammar + fields)")
36+
print("-" * 80)
37+
38+
pr = cProfile.Profile()
39+
pr.enable()
40+
41+
# Run validation 10 times to get meaningful stats
42+
for _ in range(10):
43+
report = run_structural_validation(
44+
G,
45+
sequence=sequence,
46+
max_delta_phi_s=2.0,
47+
max_phase_gradient=0.38,
48+
)
49+
50+
pr.disable()
51+
52+
# Print stats
53+
s = io.StringIO()
54+
ps = pstats.Stats(pr, stream=s).sort_stats(SortKey.CUMULATIVE)
55+
ps.print_stats(30) # Top 30 functions
56+
print(s.getvalue())
57+
58+
print("\n" + "=" * 80)
59+
print("\n2. PROFILING: Fields Only (no grammar)")
60+
print("-" * 80)
61+
62+
pr2 = cProfile.Profile()
63+
pr2.enable()
64+
65+
# Run field computations 10 times
66+
for _ in range(10):
67+
phi_s = compute_structural_potential(G)
68+
grad = compute_phase_gradient(G)
69+
curv = compute_phase_curvature(G)
70+
xi_c = estimate_coherence_length(G)
71+
72+
pr2.disable()
73+
74+
s2 = io.StringIO()
75+
ps2 = pstats.Stats(pr2, stream=s2).sort_stats(SortKey.CUMULATIVE)
76+
ps2.print_stats(30)
77+
print(s2.getvalue())
78+
79+
print("\n" + "=" * 80)
80+
print("\nProfiling complete. Key findings:")
81+
print("- Check 'cumtime' column for total time in function + children")
82+
print("- Functions with high 'tottime' are bottlenecks (self time)")
83+
print("- Focus optimization on top 5-10 functions by cumtime")

src/tnfr/physics/fields.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -246,6 +246,12 @@ def decorator(func): # type: ignore
246246
class CacheLevel: # type: ignore
247247
DERIVED_METRICS = None
248248

249+
try:
250+
from ..utils.fast_diameter import approximate_diameter_2sweep # type: ignore
251+
except ImportError: # pragma: no cover
252+
# Fallback to exact (slow) diameter if fast version unavailable
253+
approximate_diameter_2sweep = None # type: ignore
254+
249255
# Import TNFR aliases for proper attribute access
250256
try:
251257
from ..constants.aliases import ALIAS_THETA, ALIAS_DNFR # type: ignore

0 commit comments

Comments
 (0)