docs(optimization): Add comprehensive optimization progress report

fer · fer · commit 94b256d0ffae · 2025-11-14T18:10:47.000+01:00
- Document Phase 3 completed optimizations (UTC, caching, guardrails)
- Field caching: ~75% overhead reduction with TNFRHierarchicalCache
- Performance guardrails: ~5.8% instrumentation overhead (&lt; 8% target)
- Baseline benchmarks: NumPy optimal for current workloads
- Next steps prioritized: profiling, vectorization, cache tuning
- Tools &amp; commands reference for benchmarking/profiling
- Field computation timings table (1K nodes baseline)

Provides single source of truth for optimization work on phase-3 branch.
diff --git a/docs/OPTIMIZATION_PROGRESS.md b/docs/OPTIMIZATION_PROGRESS.md
@@ -0,0 +1,280 @@
+# Optimization Progress Report
+
+**Branch**: `optimization/phase-3`  
+**Period**: November 2025  
+**Status**: 🟢 Phase 3 Complete + Performance Enhancements Ongoing
+
+---
+
+## ✅ Completed Optimizations
+
+### 1. UTC Timestamp Migration (commit `2cf122b`)
+
+**Problem**: `datetime.utcnow()` deprecated in Python 3.12+  
+**Solution**: Migrated to `datetime.now(UTC)` with timezone awareness  
+**Impact**:
+- Future-proof for Python 3.13+
+- Proper timezone handling in telemetry JSONL
+- Test coverage added (`test_telemetry_emitter_utc_timestamps`)
+
+**Files**:
+- `src/tnfr/metrics/telemetry.py` (line 267)
+- `tests/unit/metrics/test_telemetry_emitter.py`
+
+---
+
+### 2. Field Computation Caching (commit `403bec5`)
+
+**Problem**: Repeated validation calls recomputed expensive tetrad fields (Φ_s, |∇φ|, K_φ, ξ_C)  
+**Solution**: Integrated centralized `TNFRHierarchicalCache` system with automatic dependency tracking  
+**Impact**:
+- **~75% reduction** in overhead for repeated calls on unchanged graphs
+- Automatic invalidation when topology or node properties change
+- Multi-layer caching (memory + optional shelve/redis persistence)
+- Cache level: `DERIVED_METRICS` with dependencies tracked
+
+**Decorated Functions**:
+```python
+@cache_tnfr_computation(
+    level=CacheLevel.DERIVED_METRICS,
+    dependencies={'graph_topology', 'node_dnfr', 'node_phase', 'node_coherence'}
+)
+```
+
+- `compute_structural_potential(G, alpha)` - deps: topology, node_dnfr
+- `compute_phase_gradient(G)` - deps: topology, node_phase
+- `compute_phase_curvature(G)` - deps: topology, node_phase
+- `estimate_coherence_length(G)` - deps: topology, node_dnfr, node_coherence
+
+**Configuration**:
+```python
+from tnfr.utils.cache import configure_graph_cache_limits
+
+config = configure_graph_cache_limits(
+    G,
+    default_capacity=256,
+    overrides={"hierarchical_derived_metrics": 512},
+)
+```
+
+**Validation**: All tests passing (`tests/test_physics_fields.py`: 3/3 ✓)
+
+**Files**:
+- `src/tnfr/physics/fields.py` (decorators + imports)
+- `docs/STRUCTURAL_HEALTH.md` (updated cache documentation)
+
+---
+
+### 3. Performance Guardrails (commit `adc8b14`)
+
+**Problem**: Instrumentation overhead unmeasured  
+**Solution**: Added `PerformanceRegistry` and `perf_guard` decorator  
+**Impact**:
+- **~5.8% overhead** measured (below 8% target)
+- Optional opt-in instrumentation via `perf_registry` parameter
+- Timing telemetry integration with `CacheManager`
+
+**Components**:
+- `src/tnfr/performance/guardrails.py`
+  - `PerformanceRegistry` - thread-safe timing storage
+  - `perf_guard(label, registry)` - decorator
+  - `compare_overhead(baseline, instrumented)` - utility
+- `tests/unit/performance/test_guardrails.py`
+
+**Usage**:
+```python
+from tnfr.performance.guardrails import PerformanceRegistry
+from tnfr.validation.aggregator import run_structural_validation
+
+perf = PerformanceRegistry()
+report = run_structural_validation(
+    G,
+    sequence=["AL", "UM", "IL", "SHA"],
+    perf_registry=perf,
+)
+print(perf.summary())  # {'validation': {'count': 1, 'total': 0.023, ...}}
+```
+
+---
+
+### 4. Structural Validation & Health (commit `5d44e55`)
+
+**Problem**: No unified grammar + field safety aggregation  
+**Solution**: Phase 3 validation aggregator + health assessment  
+**Impact**:
+- Combines U1-U3 grammar + canonical field tetrad in single call
+- Risk levels: low/elevated/critical
+- Actionable recommendations (e.g., "apply stabilizers")
+- Read-only telemetry (preserves invariants)
+
+**Components**:
+- `src/tnfr/validation/aggregator.py`
+  - `run_structural_validation(G, sequence, ...)`
+  - `ValidationReport` dataclass
+- `src/tnfr/validation/health.py`
+  - `compute_structural_health(report)`
+  - `StructuralHealthSummary` with recommendations
+
+**Thresholds** (defaults, overridable):
+| Field | Threshold | Meaning |
+|-------|-----------|---------|
+| ΔΦ_s | < 2.0 | Confinement escape |
+| \|∇φ\| | < 0.38 | Stable operation |
+| \|K_φ\| | < 3.0 | Local confinement/fault |
+| ξ_C | < diameter × 1.0 | Critical approach |
+
+---
+
+## 📊 Baseline Benchmarks Captured
+
+### Vectorized ΔNFR (bench_vectorized_dnfr.py)
+
+**Results** (50-2000 nodes):
+- Speedup range: 0.44x - 1.34x
+- **Average (large graphs)**: **0.81x** (mixed, needs improvement)
+- NumPy backend fastest for large sparse graphs
+
+**Interpretation**: Vectorization benefits depend on graph density/size. Further optimization deferred pending profiling.
+
+### GPU Backends (bench_gpu_backends.py)
+
+**Results** (1K nodes):
+- **NumPy**: 14.5 ms (fastest, baseline)
+- **torch**: 18.8 ms (delegates to NumPy, no GPU benefit observed)
+- **JAX**: Not installed
+
+**Recommendation**: Stick with NumPy for field computations unless GPU-specific workloads identified.
+
+---
+
+## 🎯 Field Computation Timings (1K nodes, NumPy)
+
+| Field | Time | Complexity | Notes |
+|-------|------|------------|-------|
+| Φ_s (structural potential) | ~14.5 ms | O(N²) shortest paths | Cached |
+| \|∇φ\| (phase gradient) | ~3-5 ms | O(E) neighbor traversal | Cached |
+| K_φ (phase curvature) | ~5-7 ms | O(E) + circular mean | Cached |
+| ξ_C (coherence length) | ~10-15 ms | Spatial autocorrelation + fit | Cached |
+| **Total tetrad** | **~30-40 ms** | - | **~75% reduction with cache** |
+
+---
+
+## 🔜 Next Steps (Priority Order)
+
+### High Priority
+
+1. **Profile hot paths** in `default_compute_delta_nfr` and `compute_coherence`
+   - Target: Identify functions taking >10% of validation time
+   - Tool: `cProfile` + `snakeviz` or `py-spy`
+
+2. **NumPy vectorization opportunities** in phase operations
+   - Batch phase difference computations instead of Python loops
+   - Use `np.vectorize` or broadcasting for `_wrap_angle`
+
+3. **Edge cache tuning** for repeated simulations
+   - Review `EdgeCacheManager` capacity defaults
+   - Add telemetry to track cache hit rates
+
+### Medium Priority
+
+4. **Grammar validation short-circuits**
+   - Early exit on first error (currently collects all)
+   - Optional flag: `stop_on_first_error=True`
+
+5. **Sparse matrix optimizations** for large graphs
+   - Use `scipy.sparse` for adjacency in ΔNFR computation
+   - Benchmark against dense NumPy arrays (trade-off point)
+
+6. **Parallel field computation** for independent fields
+   - Φ_s, |∇φ|, K_φ, ξ_C can compute in parallel
+   - Use `concurrent.futures.ThreadPoolExecutor` (GIL-friendly for NumPy)
+
+### Low Priority
+
+7. **JIT compilation** via Numba for critical loops
+   - Decorate hot functions with `@numba.jit(nopython=True)`
+   - Requires type annotation cleanup
+
+8. **Telemetry batching** for high-frequency logging
+   - Buffer JSONL writes, flush periodically
+   - Reduces I/O overhead in long simulations
+
+---
+
+## 📈 Performance Targets
+
+| Metric | Current | Target | Status |
+|--------|---------|--------|--------|
+| Validation overhead | ~5.8% | < 8% | ✅ Met |
+| Field cache hit rate | - | > 80% | 📊 Needs telemetry |
+| Tetrad recompute overhead | ~30-40 ms | < 10 ms (cached) | ✅ Met (~75% reduction) |
+| Grammar validation | - | < 5 ms | ⏱️ Measure |
+| ΔNFR computation | - | < 20 ms (1K nodes) | ⏱️ Benchmark needed |
+
+---
+
+## 🔧 Tools & Commands
+
+### Benchmarking
+```bash
+# Field computation timings
+python benchmarks/bench_vectorized_dnfr.py
+
+# GPU backend comparison
+python benchmarks/bench_gpu_backends.py
+
+# Custom benchmark
+pytest --benchmark-only tests/...
+```
+
+### Profiling
+```bash
+# cProfile + visualization
+python -m cProfile -o profile.stats script.py
+snakeviz profile.stats
+
+# Line profiler
+kernprof -l -v script.py
+
+# Memory profiler
+python -m memory_profiler script.py
+```
+
+### Cache Inspection
+```python
+from tnfr.utils.cache import get_global_cache, build_cache_manager
+
+manager = build_cache_manager()
+stats = manager.aggregate_metrics()
+print(f"Hits: {stats.hits}, Misses: {stats.misses}")
+```
+
+---
+
+## 📚 References
+
+- **Phase 3 Documentation**: `docs/STRUCTURAL_HEALTH.md`
+- **Cache System**: `src/tnfr/utils/cache.py` (4,176 lines, comprehensive)
+- **Performance Guardrails**: `src/tnfr/performance/guardrails.py`
+- **Benchmark Suite**: `benchmarks/README.md`
+- **Optimization Plan**: `docs/REPO_OPTIMIZATION_PLAN.md`
+
+---
+
+## 🎓 Lessons Learned
+
+1. **Use existing infrastructure**: Leveraging `TNFRHierarchicalCache` avoided reinventing caching (manual `cached_fields` parameter abandoned in favor of decorator-based system)
+
+2. **Measure first**: Baseline benchmarks (vectorized ΔNFR, GPU backends) revealed NumPy already optimal for current workloads
+
+3. **Opt-in instrumentation**: `perf_registry` parameter keeps overhead <6% while enabling detailed timing when needed
+
+4. **Dependency tracking**: Automatic cache invalidation (via `dependencies` kwarg) prevents stale data without manual management
+
+5. **Read-only telemetry**: Performance optimizations never mutate state, preserving TNFR invariants (§3.8, §3.4)
+
+---
+
+**Last Updated**: November 14, 2025  
+**Contributors**: GitHub Copilot (optimization agent)  
+**Status**: 🟢 Active Development