diff --git a/k8s/README.md b/k8s/README.md index 09c4cb4..9638186 100644 --- a/k8s/README.md +++ b/k8s/README.md @@ -2,82 +2,14 @@ This directory contains Kubernetes manifests for deploying the Rewards Eligibility Oracle with persistent state management. -## Prerequisites - -- Kubernetes cluster (version 1.19+) -- `kubectl` configured to access your cluster -- Docker image published to `ghcr.io/graphprotocol/rewards-eligibility-oracle` -- **Storage class configured** (see Storage Configuration below) - -## Quick Start - -### 1. Create Secrets (Required) - -```bash -# Copy the example secrets file -cp k8s/secrets.yaml.example k8s/secrets.yaml - -# Edit with your actual credentials -# IMPORTANT: Never commit secrets.yaml to version control -nano k8s/secrets.yaml -``` - -**Required secrets:** - -- **`google-credentials`**: Service account JSON for BigQuery access -- **`blockchain-private-key`**: Private key for Arbitrum Sepolia transactions -- **`arbitrum-api-key`**: API key for Arbiscan contract verification -- **`slack-webhook-url`**: Webhook URL for operational notifications - -### 2. Configure Storage (Required) - -```bash -# Check available storage classes -kubectl get storageclass - -# If you see a default storage class (marked with *), skip to step 3 -# Otherwise, edit persistent-volume-claim.yaml and uncomment the appropriate storageClassName -``` - -**Common storage classes by platform:** - -- **AWS EKS**: `gp2`, `gp3`, `ebs-csi` -- **Google GKE**: `standard`, `ssd` -- **Azure AKS**: `managed-premium`, `managed` -- **Local/Development**: `hostpath`, `local-path` - -### 3. Deploy to Kubernetes - -```bash -# Apply all manifests -kubectl apply -f k8s/ - -# Verify deployment -kubectl get pods -l app=rewards-eligibility-oracle -kubectl get pvc -l app=rewards-eligibility-oracle -``` - -### 4. Monitor Deployment - -```bash -# Check pod status -kubectl describe pod -l app=rewards-eligibility-oracle - -# View logs -kubectl logs -l app=rewards-eligibility-oracle -f - -# Check persistent volumes -kubectl get pv -``` - ## Architecture ### Persistent Storage The service uses **two persistent volumes** to maintain state across pod restarts: -- **`rewards-eligibility-oracle-data` (5GB)**: Circuit breaker state, last run tracking, BigQuery cache, CSV outputs -- **`rewards-eligibility-oracle-logs` (2GB)**: Application logs +- **`rewards-eligibility-oracle-data`**: Circuit breaker state, last run tracking, BigQuery cache, CSV outputs +- **`rewards-eligibility-oracle-logs`**: Application logs **Mount points:** @@ -86,98 +18,17 @@ The service uses **two persistent volumes** to maintain state across pod restart ### Configuration Management -**Non-sensitive configuration** → `ConfigMap` (`configmap.yaml`) -**Sensitive credentials** → `Secret` (`secrets.yaml`) - -This separation provides: - -- ✅ Easy configuration updates without rebuilding images -- ✅ Secure credential management with base64 encoding -- ✅ Clear separation of concerns - -### Resource Allocation - -**Requests (guaranteed):** - -- CPU: 250m (0.25 cores) -- Memory: 512M - -**Limits (maximum):** - -- CPU: 1000m (1.0 core) -- Memory: 1G - -## State Persistence Benefits - -With persistent volumes, the service maintains: - -1. **Circuit breaker state** → Prevents infinite restart loops -2. **Last run tracking** → Enables proper catch-up logic -3. **BigQuery cache** → Dramatic performance improvement (30s vs 5min restarts) -4. **CSV audit artifacts** → Regulatory compliance and debugging - -## Health Checks - -The deployment uses **file-based health checks** (same as docker-compose): - -**Liveness probe:** Checks `/app/healthcheck` file modification time -**Readiness probe:** Verifies `/app/healthcheck` file exists - -## Troubleshooting - -### Pod Won't Start - -```bash -# Check events -kubectl describe pod -l app=rewards-eligibility-oracle - -# Common issues: -# - Missing secrets -# - PVC provisioning failures -# - Image pull errors -``` - -### Check Persistent Storage - -```bash -# Verify PVCs are bound -kubectl get pvc - -# Check if volumes are mounted correctly -kubectl exec -it deployment/rewards-eligibility-oracle -- ls -la /app/data -``` - -### Debug Configuration - -```bash -# Check environment variables -kubectl exec -it deployment/rewards-eligibility-oracle -- env | grep -E "(BIGQUERY|BLOCKCHAIN)" - -# Verify secrets are mounted -kubectl exec -it deployment/rewards-eligibility-oracle -- ls -la /etc/secrets -``` - -## Security Best Practices - -✅ **Secrets never committed** to version control -✅ **Service account** with minimal BigQuery permissions -✅ **Private key** stored in Kubernetes secrets (base64 encoded) -✅ **Resource limits** prevent resource exhaustion -✅ **Read-only filesystem** where possible +The application requires a `config.toml` file to run. Configuration is split across two Kubernetes resources: -## Production Considerations +**ConfigMap (`configmap.yaml`):** -- **Backup strategy** for persistent volumes -- **Monitoring** and alerting setup -- **Log aggregation** (ELK stack, etc.) -- **Network policies** for additional security -- **Pod disruption budgets** for maintenance -- **Horizontal Pod Autoscaler** (if needed for scaling) +- Contains the complete `config.toml` file structure +- Includes non-sensitive settings (RPC URLs, contract addresses, batch sizes, etc.) +- Uses `$VARIABLE_NAME` placeholders for sensitive values +- **Mounted as a file** at `/app/config.toml` -## Next Steps +**Secret (`secrets.yaml`):** -1. **Test deployment** in staging environment -2. **Verify state persistence** across pod restarts -3. **Set up monitoring** and alerting -4. **Configure backup** for persistent volumes -5. **Enable quality checking** after successful validation +- Contains sensitive credentials +- **Injected as environment variables** into the container +- Values are substituted into `config.toml` placeholders at runtime diff --git a/k8s/configmap.yaml b/k8s/configmap.yaml index ec27162..e6abae8 100644 --- a/k8s/configmap.yaml +++ b/k8s/configmap.yaml @@ -5,51 +5,71 @@ metadata: labels: app: rewards-eligibility-oracle data: - # BigQuery Configuration - BIGQUERY_LOCATION_ID: "US" - BIGQUERY_PROJECT_ID: "graph-mainnet" - BIGQUERY_DATASET_ID: "internal_metrics" - BIGQUERY_TABLE_ID: "metrics_indexer_attempts" - BIGQUERY_CURATION_TABLE_ID: "metrics_curator_signals" - BIGQUERY_CURATOR_MAINNET_TABLE_ID: "curator_name_signal_dimensions_daily" - BIGQUERY_CURATOR_ARBITRUM_TABLE_ID: "curator_name_signal_dimensions_arbitrum_daily" - BIGQUERY_SUBGRAPH_LOOKUP_TABLE_ID: "subgraph_version_id_lookup" - BIGQUERY_ANALYSIS_PERIOD_DAYS: "28" - - # Blockchain Configuration (Arbitrum Sepolia) - BLOCKCHAIN_CONTRACT_ADDRESS: "0x6d5550698F930210c3f50efe744bF51C55D791f6" - BLOCKCHAIN_FUNCTION_NAME: "renewIndexerEligibility" - BLOCKCHAIN_CHAIN_ID: "421614" - BLOCK_EXPLORER_URL: "https://sepolia.arbiscan.io" - TX_TIMEOUT_SECONDS: "30" - - # RPC Provider URLs (Arbitrum Sepolia) - BLOCKCHAIN_RPC_URL_1: "https://arbitrum-sepolia.drpc.org" - BLOCKCHAIN_RPC_URL_2: "https://sepolia-rollup.arbitrum.io/rpc" - BLOCKCHAIN_RPC_URL_3: "https://api.zan.top/arb-sepolia" - BLOCKCHAIN_RPC_URL_4: "https://arbitrum-sepolia.gateway.tenderly.co" - - # Scheduling Configuration - SCHEDULED_RUN_TIME: "10:00" - - # Subgraph URLs - SUBGRAPH_URL_PRE_PRODUCTION: "https://api.studio.thegraph.com/query/110664/issuance-eligibility-oracle/v0.1.4" - SUBGRAPH_URL_PRODUCTION: "https://gateway.thegraph.com/api/subgraphs/id/" - - # Processing Configuration - BATCH_SIZE: "125" - MAX_AGE_BEFORE_DELETION: "120" - - # Caching Configuration - CACHE_MAX_AGE_MINUTES: "30" - FORCE_BIGQUERY_REFRESH: "false" - - # Eligibility Criteria - MIN_ONLINE_DAYS: "5" - MIN_SUBGRAPHS: "1" - MAX_LATENCY_MS: "5000" - MAX_BLOCKS_BEHIND: "50000" - MIN_CURATION_SIGNAL: "500" - - # Runtime Configuration - RUN_ON_STARTUP: "true" \ No newline at end of file + config.toml: | + # Service Quality Oracle Configuration + # This file separates sensitive secrets from non-sensitive configuration values + + # ============================================================================= + # NON-SENSITIVE CONFIGURATION + # ============================================================================= + + [bigquery] + BIGQUERY_LOCATION_ID = "US" + BIGQUERY_PROJECT_ID = "graph-mainnet" + BIGQUERY_DATASET_ID = "internal_metrics" + BIGQUERY_TABLE_ID = "metrics_indexer_attempts" + BIGQUERY_CURATION_TABLE_ID = "metrics_curator_signals" + BIGQUERY_CURATOR_MAINNET_TABLE_ID = "curator_name_signal_dimensions_daily" + BIGQUERY_CURATOR_ARBITRUM_TABLE_ID = "curator_name_signal_dimensions_arbitrum_daily" + BIGQUERY_SUBGRAPH_LOOKUP_TABLE_ID = "subgraph_version_id_lookup" + + [blockchain] + BLOCKCHAIN_CONTRACT_ADDRESS = "0x9BED32d2b562043a426376b99d289fE821f5b04E" + BLOCKCHAIN_FUNCTION_NAME = "renewIndexerEligibility" + BLOCKCHAIN_CHAIN_ID = 421614 + BLOCKCHAIN_RPC_URLS = [ + "https://arbitrum-sepolia.drpc.org", + "https://sepolia-rollup.arbitrum.io/rpc", + "https://api.zan.top/arb-sepolia", + "https://arbitrum-sepolia.gateway.tenderly.co" + ] + BLOCK_EXPLORER_URL = "https://sepolia.arbiscan.io" + TX_TIMEOUT_SECONDS = "30" + + [scheduling] + SCHEDULED_RUN_TIME = "10:00" + + [subgraph] + SUBGRAPH_URL_PRE_PRODUCTION = "https://api.studio.thegraph.com/query/110664/issuance-eligibility-oracle/v0.1.4" + SUBGRAPH_URL_PRODUCTION = "https://gateway.thegraph.com/api/subgraphs/id/" + + [processing] + BATCH_SIZE = 125 + MAX_AGE_BEFORE_DELETION = 120 + BIGQUERY_ANALYSIS_PERIOD_DAYS = "28" + + [caching] + # Maximum age in minutes for cached data to be considered fresh + CACHE_MAX_AGE_MINUTES = "30" + # Force BigQuery refresh even if fresh cached data exists (true/false) + FORCE_BIGQUERY_REFRESH = "false" + + [eligibility_criteria] + MIN_ONLINE_DAYS = "5" + MIN_SUBGRAPHS = "1" + MAX_LATENCY_MS = "5000" + MAX_BLOCKS_BEHIND = "50000" + MIN_CURATION_SIGNAL = "500" + + # ============================================================================= + # SENSITIVE CONFIGURATION + # ============================================================================= + + [secrets] + GOOGLE_APPLICATION_CREDENTIALS = "$GOOGLE_APPLICATION_CREDENTIALS" + BLOCKCHAIN_PRIVATE_KEY = "$BLOCKCHAIN_PRIVATE_KEY" + ETHERSCAN_API_KEY = "$ETHERSCAN_API_KEY" + ARBITRUM_API_KEY = "$ARBITRUM_API_KEY" + STUDIO_API_KEY = "$STUDIO_API_KEY" + STUDIO_DEPLOY_KEY = "$STUDIO_DEPLOY_KEY" + SLACK_WEBHOOK_URL = "$SLACK_WEBHOOK_URL" diff --git a/k8s/deployment.yaml b/k8s/deployment.yaml index e1b1878..ea2cbda 100644 --- a/k8s/deployment.yaml +++ b/k8s/deployment.yaml @@ -17,11 +17,10 @@ spec: containers: - name: rewards-eligibility-oracle image: ghcr.io/graphprotocol/rewards-eligibility-oracle:v0.4.1 # x-release-please-version - envFrom: - # Load all non-sensitive configuration from ConfigMap - - configMapRef: - name: rewards-eligibility-oracle-config env: + # Runtime-only configuration (not in config.toml) + - name: RUN_ON_STARTUP + value: "true" # Secrets from Kubernetes Secret - name: GOOGLE_APPLICATION_CREDENTIALS valueFrom: @@ -59,6 +58,10 @@ spec: name: rewards-eligibility-oracle-secrets key: slack-webhook-url volumeMounts: + - name: config-volume + mountPath: /app/config.toml + subPath: config.toml + readOnly: true - name: data-volume mountPath: /app/data - name: logs-volume @@ -90,6 +93,9 @@ spec: initialDelaySeconds: 10 periodSeconds: 30 volumes: + - name: config-volume + configMap: + name: rewards-eligibility-oracle-config - name: data-volume persistentVolumeClaim: claimName: rewards-eligibility-oracle-data diff --git a/k8s/secrets.yaml.example b/k8s/secrets.yaml.example index 2a38022..3e76f78 100644 --- a/k8s/secrets.yaml.example +++ b/k8s/secrets.yaml.example @@ -1,25 +1,15 @@ -# Kubernetes Secrets for Rewards Eligibility Oracle -# IMPORTANT: This is an EXAMPLE file - DO NOT commit actual secrets! -# -# Usage: -# 1. Copy this file to secrets.yaml -# 2. Replace all placeholder values with your actual secrets -# 3. Apply: kubectl apply -f secrets.yaml -# 4. Add secrets.yaml to .gitignore to prevent accidental commits -# -# Security: Kubernetes automatically base64 encodes stringData values - apiVersion: v1 kind: Secret metadata: name: rewards-eligibility-oracle-secrets - labels: - app: rewards-eligibility-oracle type: Opaque stringData: - # Google Cloud Service Account JSON for BigQuery access - # Create a dedicated service account with BigQuery Data Viewer + Job User roles - # Download JSON key from Google Cloud Console > IAM > Service Accounts + blockchain-private-key: "your-64-character-private-key-here" + etherscan-api-key: "your-etherscan-api-key-here" + arbitrum-api-key: "your-arbitrum-api-key-here" + studio-api-key: "your-studio-api-key-here" + studio-deploy-key: "your-studio-deploy-key-here" + slack-webhook-url: "https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX" google-credentials: | { "type": "service_account", @@ -33,24 +23,3 @@ stringData: "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/rewards-eligibility-oracle%40graph-mainnet.iam.gserviceaccount.com" } - - # Blockchain private key for Arbitrum Sepolia transactions (without 0x prefix) - # CRITICAL: This key controls blockchain transactions - keep secure - blockchain-private-key: "your-64-character-private-key-here" - - # Etherscan API key for mainnet contract verification (if needed) - # Get from: https://etherscan.io/apis - etherscan-api-key: "your-etherscan-api-key-here" - - # Arbitrum API key for contract verification on Arbitrum networks - # Get from: https://arbiscan.io/apis - arbitrum-api-key: "your-arbitrum-api-key-here" - - # The Graph Studio API credentials - # Get from: https://thegraph.com/studio/apikeys - studio-api-key: "your-studio-api-key-here" - studio-deploy-key: "your-studio-deploy-key-here" - - # Slack webhook URL for operational notifications - # Create webhook: Slack App > Incoming Webhooks > Add New Webhook - slack-webhook-url: "https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX" \ No newline at end of file