Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/pre-commit-hooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ name: Pre-commit Validation
on:
pull_request:
paths:
- '.pre-commit-config.yaml'
- '.github/workflows/pre-commit-hooks.yml'
- ".pre-commit-config.yaml"
- ".github/workflows/pre-commit-hooks.yml"

jobs:
validate-pre-commit:
Expand All @@ -19,7 +19,7 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
python-version: "3.11"

- name: Install pre-commit
run: |
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/secret-scanning.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ on:
push:
branches:
- main
- 'feature/**'
- 'fix/**'
- "feature/**"
- "fix/**"

permissions:
contents: write
Expand All @@ -23,7 +23,7 @@ jobs:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0 # Fetch all history for accurate scanning
fetch-depth: 0 # Fetch all history for accurate scanning

- name: Run Gitleaks
uses: gitleaks/gitleaks-action@v2
Expand Down
10 changes: 5 additions & 5 deletions .github/workflows/terraform-apply.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ on:
branches:
- main
paths:
- 'infra/aws/**/*.tf'
- 'infra/aws/**/*.tfvars'
- '.github/workflows/terraform-*.yml'
- "infra/aws/**/*.tf"
- "infra/aws/**/*.tfvars"
- ".github/workflows/terraform-*.yml"
workflow_dispatch:
inputs:
module:
description: 'Specific module to apply (leave empty for all changed)'
description: "Specific module to apply (leave empty for all changed)"
required: false
type: string

Expand Down Expand Up @@ -65,7 +65,7 @@ jobs:
matrix:
module: ${{ fromJson(needs.detect-changes.outputs.modules) }}
fail-fast: false
max-parallel: 1 # Apply modules one at a time to avoid conflicts
max-parallel: 1 # Apply modules one at a time to avoid conflicts
defaults:
run:
working-directory: ${{ matrix.module }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/terraform-destroy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ on:
workflow_dispatch:
inputs:
module:
description: 'Module to destroy (e.g., infra/aws/us-east-2/eks)'
description: "Module to destroy (e.g., infra/aws/us-east-2/eks)"
required: true
type: string
confirm:
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/terraform-plan.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ on:
branches:
- main
paths:
- 'infra/aws/**/*.tf'
- 'infra/aws/**/*.tfvars'
- '.github/workflows/terraform-*.yml'
- "infra/aws/**/*.tf"
- "infra/aws/**/*.tfvars"
- ".github/workflows/terraform-*.yml"

permissions:
contents: read
Expand Down
8 changes: 4 additions & 4 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,13 @@ repos:
exclude: '\.md$'
- id: end-of-file-fixer
- id: check-yaml
args: ['--unsafe'] # Allow custom YAML tags
args: ["--unsafe"] # Allow custom YAML tags
- id: check-added-large-files
args: ['--maxkb=1000']
args: ["--maxkb=1000"]
- id: check-merge-conflict
- id: detect-private-key
- id: detect-aws-credentials
args: ['--allow-missing-credentials']
args: ["--allow-missing-credentials"]

# Terraform
- repo: https://github.com/antonbabenko/pre-commit-terraform
Expand All @@ -47,7 +47,7 @@ repos:
rev: v4.5.0
hooks:
- id: no-commit-to-branch
args: ['--branch', 'main', '--branch', 'master']
args: ["--branch", "main", "--branch", "master"]
stages: [commit]

# Global settings
Expand Down
130 changes: 130 additions & 0 deletions docs/cost-optimization-strategy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Cost Optimization Strategy for Coder Demo

## Mixed Capacity Approach

### Node Group Strategy

**System Nodes (ON_DEMAND)**

- **Purpose**: Run critical Kubernetes infrastructure
- **Workloads**: CoreDNS, kube-proxy, metrics-server, cert-manager, AWS LB Controller
- **Size**: t4g.medium (ARM Graviton)
- **Count**: 1-2 nodes minimum
- **Cost**: ~$24/month (1 node) to $48/month (2 nodes)

**Application Nodes (MIXED: 20% On-Demand, 80% Spot via Karpenter)**

- **Purpose**: Run Coder server and workspaces
- **Spot Savings**: 70-90% cost reduction
- **Interruption Risk**: Mitigated by:
- Multiple instance types (diversified Spot pools)
- Karpenter auto-rebalancing
- Pod Disruption Budgets

### Karpenter NodePool Configuration

#### 1. Coder Server NodePool (ON_DEMAND Priority)

```yaml
capacity_type: ["on-demand", "spot"] # Prefer On-Demand, fallback to Spot
weight:
on-demand: 100 # Higher priority
spot: 10
```

#### 2. Coder Workspace NodePool (SPOT Priority)

```yaml
capacity_type: ["spot", "on-demand"] # Prefer Spot, fallback to On-Demand
weight:
spot: 100 # Higher priority
on-demand: 10
```

### Risk Mitigation

**Spot Interruption Handling:**

1. **2-minute warning** → Karpenter automatically provisions replacement
2. **Multiple instance types** → 15+ types reduces interruption rate to <1%
3. **Pod Disruption Budgets** → Ensures minimum replicas always running
4. **Karpenter Consolidation** → Automatically moves pods before termination

**Example Instance Type Diversity:**

```
Spot Pool: t4g.medium, t4g.large, t3a.medium, t3a.large,
m6g.medium, m6g.large, m6a.medium, m6a.large
```

### Cost Breakdown

| Component | Instance Type | Capacity | Monthly Cost |
| ------------------ | ------------- | --------- | ------------- |
| System Nodes (2) | t4g.medium | ON_DEMAND | $48 |
| Coder Server (2) | t4g.large | 80% SPOT | $28 (vs $140) |
| Workspaces (avg 5) | t4g.xlarge | 90% SPOT | $75 (vs $750) |
| **Total** | | **Mixed** | **$151/mo** |

**vs All On-Demand:** $938/month → **84% savings**

### Dynamic Scaling

**Low Usage (nights/weekends):**

- Scale to zero workspaces
- Keep 1 system node + 1 Coder server node
- Cost: ~$48/month during idle

**High Usage (business hours):**

- Auto-scale workspaces on Spot
- Karpenter provisions nodes in <60 seconds
- Cost: ~$150-200/month during peak

### Monitoring & Alerts

**CloudWatch Alarms:**

- Spot interruption rate > 5%
- Available On-Demand capacity < 20%
- Karpenter provisioning failures

**Response:**

- Automatic fallback to On-Demand
- Email alerts to ops team
- Karpenter adjusts instance type mix

## Implementation Timeline

1. ✅ Deploy EKS with ON_DEMAND system nodes
2. ⏳ Deploy Karpenter
3. ⏳ Configure mixed-capacity NodePools
4. ⏳ Deploy Coder with node affinity rules
5. ⏳ Test Spot interruption handling
6. ⏳ Enable auto-scaling policies

## Fallback Plan

If Spot becomes unreliable (rare):

1. Update Karpenter NodePool to 100% On-Demand
2. `kubectl apply -f nodepool-ondemand.yaml`
3. Karpenter gracefully migrates pods
4. Takes ~5 minutes, zero downtime

## Best Practices

✅ **DO:**

- Use multiple Spot instance types (10+)
- Set Pod Disruption Budgets
- Monitor Spot interruption rates
- Test failover regularly

❌ **DON'T:**

- Run databases on Spot (use RDS)
- Use Spot for single-replica critical services
- Rely on single instance type for Spot
7 changes: 7 additions & 0 deletions infra/aws/us-east-2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ This directory uses remote S3 backend for state management, but **backend config
## Local Setup

1. **Get backend configuration from teammate** or **retrieve from AWS**:

```bash
# Get S3 bucket name (it contains the account ID)
aws s3 ls | grep terraform-state
Expand All @@ -24,6 +25,7 @@ This directory uses remote S3 backend for state management, but **backend config
```

Create `backend.tf`:

```hcl
terraform {
backend "s3" {
Expand Down Expand Up @@ -62,6 +64,7 @@ These are configured in: Repository Settings > Secrets and variables > Actions
Instead of creating backend.tf, you can use a config file:

1. Create `backend.conf` (gitignored):

```
bucket = "YOUR-BUCKET-NAME"
dynamodb_table = "YOUR-TABLE-NAME"
Expand All @@ -86,12 +89,14 @@ Instead of creating backend.tf, you can use a config file:
This repository has automated secret scanning to prevent accidental exposure of credentials:

### GitHub Actions (Automated)

- **Gitleaks** - Scans every PR and push for secrets
- **TruffleHog** - Additional verification layer
- **Custom Pattern Matching** - Catches common secret patterns
- **Auto-Revert** - Automatically reverts commits to main with secrets

### Pre-commit Hooks (Local)

Catch secrets before they reach GitHub:

```bash
Expand All @@ -106,6 +111,7 @@ pre-commit run --all-files
```

### What Gets Detected

- AWS Access Keys (AKIA...)
- API Keys and Tokens
- Private Keys (RSA, SSH, etc.)
Expand All @@ -115,6 +121,7 @@ pre-commit run --all-files
- High-entropy strings (likely secrets)

### If Secrets Are Detected

1. **PR is blocked** - Cannot merge until secrets are removed
2. **Automatic notification** - PR comment explains the issue
3. **Required actions**:
Expand Down
Loading
Loading