Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 12 additions & 161 deletions k8s/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,82 +2,14 @@

This directory contains Kubernetes manifests for deploying the Rewards Eligibility Oracle with persistent state management.

## Prerequisites

- Kubernetes cluster (version 1.19+)
- `kubectl` configured to access your cluster
- Docker image published to `ghcr.io/graphprotocol/rewards-eligibility-oracle`
- **Storage class configured** (see Storage Configuration below)

## Quick Start

### 1. Create Secrets (Required)

```bash
# Copy the example secrets file
cp k8s/secrets.yaml.example k8s/secrets.yaml

# Edit with your actual credentials
# IMPORTANT: Never commit secrets.yaml to version control
nano k8s/secrets.yaml
```

**Required secrets:**

- **`google-credentials`**: Service account JSON for BigQuery access
- **`blockchain-private-key`**: Private key for Arbitrum Sepolia transactions
- **`arbitrum-api-key`**: API key for Arbiscan contract verification
- **`slack-webhook-url`**: Webhook URL for operational notifications

### 2. Configure Storage (Required)

```bash
# Check available storage classes
kubectl get storageclass

# If you see a default storage class (marked with *), skip to step 3
# Otherwise, edit persistent-volume-claim.yaml and uncomment the appropriate storageClassName
```

**Common storage classes by platform:**

- **AWS EKS**: `gp2`, `gp3`, `ebs-csi`
- **Google GKE**: `standard`, `ssd`
- **Azure AKS**: `managed-premium`, `managed`
- **Local/Development**: `hostpath`, `local-path`

### 3. Deploy to Kubernetes

```bash
# Apply all manifests
kubectl apply -f k8s/

# Verify deployment
kubectl get pods -l app=rewards-eligibility-oracle
kubectl get pvc -l app=rewards-eligibility-oracle
```

### 4. Monitor Deployment

```bash
# Check pod status
kubectl describe pod -l app=rewards-eligibility-oracle

# View logs
kubectl logs -l app=rewards-eligibility-oracle -f

# Check persistent volumes
kubectl get pv
```

## Architecture

### Persistent Storage

The service uses **two persistent volumes** to maintain state across pod restarts:

- **`rewards-eligibility-oracle-data` (5GB)**: Circuit breaker state, last run tracking, BigQuery cache, CSV outputs
- **`rewards-eligibility-oracle-logs` (2GB)**: Application logs
- **`rewards-eligibility-oracle-data`**: Circuit breaker state, last run tracking, BigQuery cache, CSV outputs
- **`rewards-eligibility-oracle-logs`**: Application logs

**Mount points:**

Expand All @@ -86,98 +18,17 @@ The service uses **two persistent volumes** to maintain state across pod restart

### Configuration Management

**Non-sensitive configuration** → `ConfigMap` (`configmap.yaml`)
**Sensitive credentials** → `Secret` (`secrets.yaml`)

This separation provides:

- ✅ Easy configuration updates without rebuilding images
- ✅ Secure credential management with base64 encoding
- ✅ Clear separation of concerns

### Resource Allocation

**Requests (guaranteed):**

- CPU: 250m (0.25 cores)
- Memory: 512M

**Limits (maximum):**

- CPU: 1000m (1.0 core)
- Memory: 1G

## State Persistence Benefits

With persistent volumes, the service maintains:

1. **Circuit breaker state** → Prevents infinite restart loops
2. **Last run tracking** → Enables proper catch-up logic
3. **BigQuery cache** → Dramatic performance improvement (30s vs 5min restarts)
4. **CSV audit artifacts** → Regulatory compliance and debugging

## Health Checks

The deployment uses **file-based health checks** (same as docker-compose):

**Liveness probe:** Checks `/app/healthcheck` file modification time
**Readiness probe:** Verifies `/app/healthcheck` file exists

## Troubleshooting

### Pod Won't Start

```bash
# Check events
kubectl describe pod -l app=rewards-eligibility-oracle

# Common issues:
# - Missing secrets
# - PVC provisioning failures
# - Image pull errors
```

### Check Persistent Storage

```bash
# Verify PVCs are bound
kubectl get pvc

# Check if volumes are mounted correctly
kubectl exec -it deployment/rewards-eligibility-oracle -- ls -la /app/data
```

### Debug Configuration

```bash
# Check environment variables
kubectl exec -it deployment/rewards-eligibility-oracle -- env | grep -E "(BIGQUERY|BLOCKCHAIN)"

# Verify secrets are mounted
kubectl exec -it deployment/rewards-eligibility-oracle -- ls -la /etc/secrets
```

## Security Best Practices

✅ **Secrets never committed** to version control
✅ **Service account** with minimal BigQuery permissions
✅ **Private key** stored in Kubernetes secrets (base64 encoded)
✅ **Resource limits** prevent resource exhaustion
✅ **Read-only filesystem** where possible
The application requires a `config.toml` file to run. Configuration is split across two Kubernetes resources:

## Production Considerations
**ConfigMap (`configmap.yaml`):**

- **Backup strategy** for persistent volumes
- **Monitoring** and alerting setup
- **Log aggregation** (ELK stack, etc.)
- **Network policies** for additional security
- **Pod disruption budgets** for maintenance
- **Horizontal Pod Autoscaler** (if needed for scaling)
- Contains the complete `config.toml` file structure
- Includes non-sensitive settings (RPC URLs, contract addresses, batch sizes, etc.)
- Uses `$VARIABLE_NAME` placeholders for sensitive values
- **Mounted as a file** at `/app/config.toml`

## Next Steps
**Secret (`secrets.yaml`):**

1. **Test deployment** in staging environment
2. **Verify state persistence** across pod restarts
3. **Set up monitoring** and alerting
4. **Configure backup** for persistent volumes
5. **Enable quality checking** after successful validation
- Contains sensitive credentials
- **Injected as environment variables** into the container
- Values are substituted into `config.toml` placeholders at runtime
116 changes: 68 additions & 48 deletions k8s/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,51 +5,71 @@ metadata:
labels:
app: rewards-eligibility-oracle
data:
# BigQuery Configuration
BIGQUERY_LOCATION_ID: "US"
BIGQUERY_PROJECT_ID: "graph-mainnet"
BIGQUERY_DATASET_ID: "internal_metrics"
BIGQUERY_TABLE_ID: "metrics_indexer_attempts"
BIGQUERY_CURATION_TABLE_ID: "metrics_curator_signals"
BIGQUERY_CURATOR_MAINNET_TABLE_ID: "curator_name_signal_dimensions_daily"
BIGQUERY_CURATOR_ARBITRUM_TABLE_ID: "curator_name_signal_dimensions_arbitrum_daily"
BIGQUERY_SUBGRAPH_LOOKUP_TABLE_ID: "subgraph_version_id_lookup"
BIGQUERY_ANALYSIS_PERIOD_DAYS: "28"

# Blockchain Configuration (Arbitrum Sepolia)
BLOCKCHAIN_CONTRACT_ADDRESS: "0x6d5550698F930210c3f50efe744bF51C55D791f6"
BLOCKCHAIN_FUNCTION_NAME: "renewIndexerEligibility"
BLOCKCHAIN_CHAIN_ID: "421614"
BLOCK_EXPLORER_URL: "https://sepolia.arbiscan.io"
TX_TIMEOUT_SECONDS: "30"

# RPC Provider URLs (Arbitrum Sepolia)
BLOCKCHAIN_RPC_URL_1: "https://arbitrum-sepolia.drpc.org"
BLOCKCHAIN_RPC_URL_2: "https://sepolia-rollup.arbitrum.io/rpc"
BLOCKCHAIN_RPC_URL_3: "https://api.zan.top/arb-sepolia"
BLOCKCHAIN_RPC_URL_4: "https://arbitrum-sepolia.gateway.tenderly.co"

# Scheduling Configuration
SCHEDULED_RUN_TIME: "10:00"

# Subgraph URLs
SUBGRAPH_URL_PRE_PRODUCTION: "https://api.studio.thegraph.com/query/110664/issuance-eligibility-oracle/v0.1.4"
SUBGRAPH_URL_PRODUCTION: "https://gateway.thegraph.com/api/subgraphs/id/"

# Processing Configuration
BATCH_SIZE: "125"
MAX_AGE_BEFORE_DELETION: "120"

# Caching Configuration
CACHE_MAX_AGE_MINUTES: "30"
FORCE_BIGQUERY_REFRESH: "false"

# Eligibility Criteria
MIN_ONLINE_DAYS: "5"
MIN_SUBGRAPHS: "1"
MAX_LATENCY_MS: "5000"
MAX_BLOCKS_BEHIND: "50000"
MIN_CURATION_SIGNAL: "500"

# Runtime Configuration
RUN_ON_STARTUP: "true"
config.toml: |
# Service Quality Oracle Configuration
# This file separates sensitive secrets from non-sensitive configuration values

# =============================================================================
# NON-SENSITIVE CONFIGURATION
# =============================================================================

[bigquery]
BIGQUERY_LOCATION_ID = "US"
BIGQUERY_PROJECT_ID = "graph-mainnet"
BIGQUERY_DATASET_ID = "internal_metrics"
BIGQUERY_TABLE_ID = "metrics_indexer_attempts"
BIGQUERY_CURATION_TABLE_ID = "metrics_curator_signals"
BIGQUERY_CURATOR_MAINNET_TABLE_ID = "curator_name_signal_dimensions_daily"
BIGQUERY_CURATOR_ARBITRUM_TABLE_ID = "curator_name_signal_dimensions_arbitrum_daily"
BIGQUERY_SUBGRAPH_LOOKUP_TABLE_ID = "subgraph_version_id_lookup"

[blockchain]
BLOCKCHAIN_CONTRACT_ADDRESS = "0x9BED32d2b562043a426376b99d289fE821f5b04E"
BLOCKCHAIN_FUNCTION_NAME = "renewIndexerEligibility"
BLOCKCHAIN_CHAIN_ID = 421614
BLOCKCHAIN_RPC_URLS = [
"https://arbitrum-sepolia.drpc.org",
"https://sepolia-rollup.arbitrum.io/rpc",
"https://api.zan.top/arb-sepolia",
"https://arbitrum-sepolia.gateway.tenderly.co"
]
BLOCK_EXPLORER_URL = "https://sepolia.arbiscan.io"
TX_TIMEOUT_SECONDS = "30"

[scheduling]
SCHEDULED_RUN_TIME = "10:00"

[subgraph]
SUBGRAPH_URL_PRE_PRODUCTION = "https://api.studio.thegraph.com/query/110664/issuance-eligibility-oracle/v0.1.4"
SUBGRAPH_URL_PRODUCTION = "https://gateway.thegraph.com/api/subgraphs/id/"

[processing]
BATCH_SIZE = 125
MAX_AGE_BEFORE_DELETION = 120
BIGQUERY_ANALYSIS_PERIOD_DAYS = "28"

[caching]
# Maximum age in minutes for cached data to be considered fresh
CACHE_MAX_AGE_MINUTES = "30"
# Force BigQuery refresh even if fresh cached data exists (true/false)
FORCE_BIGQUERY_REFRESH = "false"

[eligibility_criteria]
MIN_ONLINE_DAYS = "5"
MIN_SUBGRAPHS = "1"
MAX_LATENCY_MS = "5000"
MAX_BLOCKS_BEHIND = "50000"
MIN_CURATION_SIGNAL = "500"

# =============================================================================
# SENSITIVE CONFIGURATION
# =============================================================================

[secrets]
GOOGLE_APPLICATION_CREDENTIALS = "$GOOGLE_APPLICATION_CREDENTIALS"
BLOCKCHAIN_PRIVATE_KEY = "$BLOCKCHAIN_PRIVATE_KEY"
ETHERSCAN_API_KEY = "$ETHERSCAN_API_KEY"
ARBITRUM_API_KEY = "$ARBITRUM_API_KEY"
STUDIO_API_KEY = "$STUDIO_API_KEY"
STUDIO_DEPLOY_KEY = "$STUDIO_DEPLOY_KEY"
SLACK_WEBHOOK_URL = "$SLACK_WEBHOOK_URL"
14 changes: 10 additions & 4 deletions k8s/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,10 @@ spec:
containers:
- name: rewards-eligibility-oracle
image: ghcr.io/graphprotocol/rewards-eligibility-oracle:v0.4.1 # x-release-please-version
envFrom:
# Load all non-sensitive configuration from ConfigMap
- configMapRef:
name: rewards-eligibility-oracle-config
env:
# Runtime-only configuration (not in config.toml)
- name: RUN_ON_STARTUP
value: "true"
# Secrets from Kubernetes Secret
- name: GOOGLE_APPLICATION_CREDENTIALS
valueFrom:
Expand Down Expand Up @@ -59,6 +58,10 @@ spec:
name: rewards-eligibility-oracle-secrets
key: slack-webhook-url
volumeMounts:
- name: config-volume
mountPath: /app/config.toml
subPath: config.toml
readOnly: true
- name: data-volume
mountPath: /app/data
- name: logs-volume
Expand Down Expand Up @@ -90,6 +93,9 @@ spec:
initialDelaySeconds: 10
periodSeconds: 30
volumes:
- name: config-volume
configMap:
name: rewards-eligibility-oracle-config
- name: data-volume
persistentVolumeClaim:
claimName: rewards-eligibility-oracle-data
Expand Down
Loading
Loading