diff --git a/docs/technical-design.md b/docs/technical-design.md
index 6439f1a..835f955 100644
--- a/docs/technical-design.md
+++ b/docs/technical-design.md
@@ -1,6 +1,168 @@
# Technical Design & Architecture
-This document outlines key architectural decisions and data flows within the Rewards Eligibility Oracle.
+This document's purpose is to visually represent the Rewards Eligibility Oracle codebase, as a more approachable alternative to reading through the codebase directly.
+
+## End-to-End Oracle Flow
+
+The Rewards Eligibility Oracle operates as a daily scheduled service that evaluates indexer performance and updates on-chain rewards eligibility via function calls to the RewardsEligibilityOracle contract. The diagram below illustrates the complete execution flow from scheduler trigger through data processing to blockchain submission and error handling.
+
+The Oracle is designed to be resilient to transient network issues and RPC provider failures. It uses a multi-layered approach involving internal retries, provider rotation, and a circuit breaker to prevent costly infinite restart loops that needlessly burn through BigQuery requests.
+
+```mermaid
+---
+title: Rewards Eligibility Oracle - End-to-End Flow
+---
+%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#fee2e2', 'primaryTextColor':'#7f1d1d', 'primaryBorderColor':'#ef4444', 'lineColor':'#6b7280'}}}%%
+
+graph TB
+ %% Docker Container - Contains all oracle logic
+ subgraph DOCKER["Docker Container"]
+ Scheduler["Python Scheduler"]
+ Oracle["Rewards Eligibility Oracle"]
+
+ subgraph CIRCUIT_BREAKER["Circuit Breaker Logic"]
+ CB["Circuit Breaker"]
+ CBCheck{"Has there been more
than 3 failures in the
last 60 minutes?"}
+ end
+
+ Scheduler -.->|"Phase 1: Schedule daily run"| Oracle
+
+ %% Data Pipeline
+ subgraph PIPELINE["Data Pipeline"]
+ CacheCheck{"Do we have recent cached
BigQuery results available?
(< 30 min old)"}
+
+ subgraph BIGQUERY["BigQuery Analysis"]
+ FetchData["Fetch Indexer Performance Data
over last 28 days
(from BigQuery)"]
+ SQLQuery["- Daily query metrics
- Days online calculation
- Subgraph coverage"]
+ end
+
+ subgraph PROCESSING["Eligibility Processing"]
+ ApplyCriteria["Apply Criteria e.g.
5+ days online
Latency < 5000ms
Blocks behind < 50000
1+ subgraph served"]
+ FilterData["Filter Eligible
vs Ineligible"]
+ GenArtifacts["Generate CSV Artifacts:
- eligible_indexers.csv
- ineligible_indexers.csv
- full_metrics.csv"]
+ end
+ end
+
+ %% Blockchain Layer
+ subgraph BLOCKCHAIN["Blockchain Submission"]
+ Batch["Consume series of Eligible
Indexers from CSV.
Batch indexer addresses
into groups of 125 indexers."]
+
+ subgraph RPC["RPC Failover System"]
+ TryRPC["Try establish connection
with RPC provider"]
+ RPCError["Rotate to next RPC provider"]
+ end
+
+ BuildTx["Build Transaction:
- Estimate gas
- Get nonce
- Sign with key"]
+ SubmitTx["Submit Batch to Contract
call function:
renewIndexerEligibility()"]
+ WaitReceipt["Wait for Receipt
30s timeout"]
+ MoreBatches{"More
Batches?"}
+ end
+
+ %% Monitoring
+ subgraph MONITOR["Monitoring & Notifications"]
+ SlackSuccess["Slack Success:
- Eligible count
- Execution time
- Transaction links"]
+ SlackFailCircuitBreaker["Stop container sys.exit(0)
Container will not restart
Manual Intervention needed
Send notification to team
slack channel for debugging"]
+ SlackFailRPC["Stop container sys.exit(1)
Container will restart
Send notification to slack"]
+ SlackRotate["Send slack notification"]
+ end
+ end
+
+ %% External Systems - Define after Docker subgraph
+ RPCProviders["Pool of 4 RPC providers
(External Infrastructure)"]
+ BQ["Google BigQuery
Indexer Performance Data"]
+
+ subgraph FailureLogStorage["Data Storage
(mounted volume)"]
+ CBLog["Failure log"]
+ end
+
+ subgraph HistoricalDataStorage["Data Storage
(mounted volume)"]
+ HistoricalData["Historical archive of
eligible and ineligible
indexers by date
YYYY-MM-DD"]
+ end
+
+ END_NO_RESTART["FAILURE
Container Stopped
No Restart
Manual Intervention Required"]
+ END_WITH_RESTART["FAILURE
Container Stopped
Restart Container
Will retry entire loop again"]
+ SUCCESS["SUCCESS
Wait for next
scheduled trigger"]
+
+ %% Main Flow - Start with Docker container to anchor it left
+ Oracle -->|"Phase 1.1: Check if oracle
should run"| CB
+ CB -->|"Phase 1.2: Read log"| CBLog
+ CBLog -->|"Phase 1.3: Return log"| CB
+ CB -->|"Phase 1.4: Provides failure
timestamps (if they exist)"| CBCheck
+ CBCheck -->|"Phase 2:
(Regular Path)
No"| CacheCheck
+ CacheCheck -->|"Phase 2.1: Check for
recent cached data"| HistoricalData
+ HistoricalData -->|"Phase 2.2: Return recent eligible indexers
from eligible_indexers.csv
(if they exist)"| CacheCheck
+ CBCheck -.->|"Phase 2:
(Alternative Path)
Yes"| SlackFailCircuitBreaker
+ SlackFailCircuitBreaker -.-> END_NO_RESTART
+
+ CacheCheck -->|"Phase 3:
(Alternative Path)
Yes"| Batch
+ CacheCheck -->|"Phase 3:
(Regular Path)
No"| FetchData
+
+ FetchData -->|"Phase 3.1: Query data
from BigQuery"| BQ
+ BQ -->|"Phase 3.2: Returns metrics"| SQLQuery
+ SQLQuery -->|"Phase 3.3: Process results"| ApplyCriteria
+ ApplyCriteria --> FilterData
+ FilterData -->|"Phase 3.4: Generate CSV's"| GenArtifacts
+ GenArtifacts -->|"Phase 3.5: Save data"| HistoricalData
+ GenArtifacts --> Batch
+
+ Batch -->|"Phase 4.1: For each batch"| TryRPC
+ TryRPC -->|"Phase 4.2: Connect"| RPCProviders
+ RPCProviders -->|"Phase 4.3:
(Regular Path)
RPC connection established"| BuildTx
+ RPCProviders -.->|"Phase 4.3:
(Alternative Path)
RPC connection failed
Multiple connection attempts
Not possible to connect"| RPCError
+ RPCError -.->|"Notify"| SlackRotate
+ RPCError -->|"All exhausted"| SlackFailRPC
+ SlackFailRPC --> END_WITH_RESTART
+ RPCError -->|"Connection successful"| BuildTx
+
+ BuildTx --> SubmitTx
+ SubmitTx --> WaitReceipt
+
+ WaitReceipt -->|"Phase 4.4: Batch confirmed"| MoreBatches
+
+ MoreBatches -->|"Yes
Back to phase 4 loop
Process next batch"| Batch
+ MoreBatches -->|"Phase 5: No
All complete"| SlackSuccess
+ SlackSuccess --> SUCCESS
+
+ %% Styling
+ classDef schedulerStyle fill:#fee2e2,stroke:#ef4444,stroke-width:3px,color:#7f1d1d
+ classDef oracleStyle fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
+ classDef dataStyle fill:#f0fdf4,stroke:#16a34a,stroke-width:2px,color:#14532d
+ classDef processingStyle fill:#e0e7ff,stroke:#6366f1,stroke-width:2px,color:#312e81
+ classDef blockchainStyle fill:#fee2e2,stroke:#ef4444,stroke-width:2px,color:#7f1d1d
+ classDef monitorStyle fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
+ classDef infraStyle fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#374151
+ classDef contractStyle fill:#dbeafe,stroke:#2563eb,stroke-width:3px,color:#1e3a8a
+ classDef decisionStyle fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
+ classDef endStyle fill:#7f1d1d,stroke:#991b1b,stroke-width:3px,color:#fee2e2
+ classDef endStyleOrange fill:#ea580c,stroke:#c2410c,stroke-width:3px,color:#ffedd5
+ classDef successStyle fill:#14532d,stroke:#166534,stroke-width:3px,color:#f0fdf4
+
+ class Scheduler schedulerStyle
+ class Oracle,CB oracleStyle
+ class FetchData,SQLQuery,BQ dataStyle
+ class ApplyCriteria,FilterData,GenArtifacts processingStyle
+ class Batch,TryRPC,BuildTx,SubmitTx,WaitReceipt,Rotate,RPCError blockchainStyle
+ class SlackSuccess,SlackFailCircuitBreaker,SlackFailRPC,SlackRotate monitorStyle
+ class RPCProviders,HistoricalData,CBLog infraStyle
+ class Contract contractStyle
+ class CacheCheck,MoreBatches,CBCheck decisionStyle
+ class END_NO_RESTART endStyle
+ class END_WITH_RESTART endStyleOrange
+ class SUCCESS successStyle
+
+ style DOCKER fill:#dbeafe,stroke:#2563eb,stroke-width:3px,color:#1e3a8a
+ style CIRCUIT_BREAKER fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
+ style PIPELINE fill:#f0fdf4,stroke:#16a34a,stroke-width:2px,color:#14532d
+ style BIGQUERY fill:#dcfce7,stroke:#16a34a,stroke-width:2px,color:#14532d
+ style PROCESSING fill:#e0e7ff,stroke:#6366f1,stroke-width:2px,color:#312e81
+ style BLOCKCHAIN fill:#fee2e2,stroke:#ef4444,stroke-width:2px,color:#7f1d1d
+ style RPC fill:#fecaca,stroke:#ef4444,stroke-width:2px,color:#7f1d1d
+ style MONITOR fill:#fef3c7,stroke:#f59e0b,stroke-width:2px,color:#92400e
+ style FailureLogStorage fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#374151
+ style HistoricalDataStorage fill:#f3f4f6,stroke:#6b7280,stroke-width:2px,color:#374151
+```
+
+---
## RPC Provider Failover and Circuit Breaker Logic
@@ -21,24 +183,22 @@ sequenceDiagram
# Describe failure loop inside the blockchain_client module
activate blockchain_client
- alt RPC Loop (for each provider)
+ loop For each provider in pool
- # Attempt RPC call
- blockchain_client->>blockchain_client: _execute_rpc_call() with provider A
- note right of blockchain_client: Fails after 5 retries
+ # Attempt RPC call
+ blockchain_client->>blockchain_client: _execute_rpc_call() with next provider
+ note right of blockchain_client: Fails after 3 attempts
- # Log failure
+ # Log failure and rotate
blockchain_client-->>blockchain_client: raises ConnectionError
- note right of blockchain_client: Catches error, logs rotation
+ note right of blockchain_client: Catches error, rotates to next provider
- # Retry RPC call
- blockchain_client->>blockchain_client: _execute_rpc_call() with provider B
- note right of blockchain_client: Fails after 5 retries
+ # Send rotation notification
+ blockchain_client->>slack_notifier: send_info_notification()
+ note right of slack_notifier: RPC provider rotation alert
- # Log final failure
- blockchain_client-->>blockchain_client: raises ConnectionError
- note right of blockchain_client: All providers tried and failed
end
+ note right of blockchain_client: All providers exhausted
# Raise error back to main_oracle oracle and exit blockchain_client module
blockchain_client-->>main_oracle: raises Final ConnectionError
@@ -51,6 +211,6 @@ sequenceDiagram
main_oracle->>slack_notifier: send_failure_notification()
# Document restart process
- note right of main_oracle: sys.exit(1)
- note right of main_oracle: Docker will restart. CircuitBreaker can halt via sys.exit(0)
+ note right of main_oracle: sys.exit(1) triggers Docker restart
+ note right of main_oracle: Circuit breaker uses sys.exit(0) to prevent restart
```