Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 54 additions & 14 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ stackstate-backup-cli/
β”‚ β”œβ”€β”€ root.go # Root command and global flags
β”‚ β”œβ”€β”€ version/ # Version information command
β”‚ β”œβ”€β”€ elasticsearch/ # Elasticsearch backup/restore commands
β”‚ └── stackgraph/ # Stackgraph backup/restore commands
β”‚ β”œβ”€β”€ stackgraph/ # Stackgraph backup/restore commands
β”‚ └── victoriametrics/ # VictoriaMetrics backup/restore commands
β”‚
β”œβ”€β”€ internal/ # Internal packages (Layers 0-3)
β”‚ β”œβ”€β”€ foundation/ # Layer 0: Core utilities
Expand All @@ -35,7 +36,8 @@ stackstate-backup-cli/
β”‚ β”‚
β”‚ β”œβ”€β”€ orchestration/ # Layer 2: Workflows
β”‚ β”‚ β”œβ”€β”€ portforward/ # Port-forwarding orchestration
β”‚ β”‚ └── scale/ # Deployment scaling workflows
β”‚ β”‚ β”œβ”€β”€ scale/ # Deployment/StatefulSet scaling workflows
β”‚ β”‚ └── restore/ # Restore job orchestration
β”‚ β”‚
β”‚ β”œβ”€β”€ app/ # Layer 3: Dependency Container
β”‚ β”‚ └── app.go # Application context and dependency injection
Expand All @@ -62,7 +64,8 @@ stackstate-backup-cli/

**Key Packages**:
- `cmd/elasticsearch/`: Elasticsearch snapshot/restore commands (configure, list-snapshots, list-indices, restore-snapshot)
- `cmd/stackgraph/`: Stackgraph backup/restore commands (list, restore)
- `cmd/stackgraph/`: Stackgraph backup/restore commands (list, restore, check-and-finalize)
- `cmd/victoriametrics/`: VictoriaMetrics backup/restore commands (list, restore, check-and-finalize)
- `cmd/version/`: Version information

**Dependency Rules**:
Expand Down Expand Up @@ -117,7 +120,8 @@ appCtx.Formatter

**Key Packages**:
- `portforward/`: Manages Kubernetes port-forwarding lifecycle
- `scale/`: Deployment scaling workflows with detailed logging
- `scale/`: Deployment and StatefulSet scaling workflows with detailed logging
- `restore/`: Restore job orchestration (confirmation, job lifecycle, finalization, resource management)

**Dependency Rules**:
- βœ… Can import: `internal/foundation/*`, `internal/clients/*`
Expand Down Expand Up @@ -167,7 +171,7 @@ appCtx.Formatter

```
1. User invokes CLI command
└─> cmd/elasticsearch/restore-snapshot.go
└─> cmd/victoriametrics/restore.go (or stackgraph/restore.go)
β”‚
2. Parse flags and validate input
└─> Cobra command receives global flags
Expand All @@ -177,16 +181,17 @@ appCtx.Formatter
β”œβ”€> internal/clients/k8s/ (K8s client)
β”œβ”€> internal/foundation/config/ (Load from ConfigMap/Secret)
β”œβ”€> internal/clients/s3/ (S3/Minio client)
β”œβ”€> internal/clients/elasticsearch/ (ES client)
β”œβ”€> internal/foundation/logger/ (Logger)
└─> internal/foundation/output/ (Formatter)
β”‚
4. Execute business logic with injected dependencies
└─> runRestore(appCtx)
β”œβ”€> internal/orchestration/scale/ (Scale down)
β”œβ”€> internal/orchestration/portforward/ (Port-forward)
β”œβ”€> internal/clients/elasticsearch/ (Restore snapshot)
└─> internal/orchestration/scale/ (Scale up)
β”œβ”€> internal/orchestration/restore/ (User confirmation)
β”œβ”€> internal/orchestration/scale/ (Scale down StatefulSets)
β”œβ”€> internal/orchestration/restore/ (Ensure resources: ConfigMaps, Secrets)
β”œβ”€> internal/clients/k8s/ (Create restore Job)
β”œβ”€> internal/orchestration/restore/ (Wait for completion & cleanup)
└─> internal/orchestration/scale/ (Scale up StatefulSets)
β”‚
5. Format and display results
└─> appCtx.Formatter.PrintTable() or PrintJSON()
Expand Down Expand Up @@ -262,15 +267,50 @@ defer close(pf.StopChan) // Automatic cleanup

### 5. Scale Down/Up Pattern

Deployments are scaled down before restore operations and scaled up afterward:
Deployments and StatefulSets are scaled down before restore operations and scaled up afterward:

```go
// Example usage
scaledDeployments, _ := scale.ScaleDown(k8sClient, namespace, selector, log)
defer scale.ScaleUp(k8sClient, namespace, scaledDeployments, log)
scaledResources, _ := scale.ScaleDown(k8sClient, namespace, selector, log)
defer scale.ScaleUpFromAnnotations(k8sClient, namespace, selector, log)
```

### 6. Structured Logging
**Note**: Scaling now supports both Deployments and StatefulSets through a unified interface.

### 6. Restore Orchestration Pattern

Common restore operations are centralized in the `restore` orchestration layer:

```go
// User confirmation
if !restore.PromptForConfirmation() {
return fmt.Errorf("operation cancelled")
}

// Wait for job completion and cleanup
restore.PrintWaitingMessage(log, "service-name", jobName, namespace)
err := restore.WaitAndCleanup(k8sClient, namespace, jobName, log, cleanupPVC)

// Check and finalize background jobs
err := restore.CheckAndFinalize(restore.CheckAndFinalizeParams{
K8sClient: k8sClient,
Namespace: namespace,
JobName: jobName,
ServiceName: "service-name",
ScaleSelector: config.ScaleDownLabelSelector,
CleanupPVC: true,
WaitForJob: false,
Log: log,
})
```

**Benefits**:

- Eliminates duplicate code between Stackgraph and VictoriaMetrics restore commands
- Consistent user experience across services
- Centralized job lifecycle management and cleanup

### 7. Structured Logging

All operations use structured logging with consistent levels:

Expand Down
86 changes: 81 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@ This CLI tool replaces the legacy Bash-based backup/restore scripts with a singl
**Current Support:**
- Elasticsearch snapshots and restores
- Stackgraph backups and restores
- VictoriaMetrics backups and restores

**Planned:** VictoriaMetrics, ClickHouse, Configuration backups
**Planned:** ClickHouse, Configuration backups

## Installation

Expand Down Expand Up @@ -112,11 +113,76 @@ sts-backup stackgraph restore --namespace <namespace> [--archive <name> | --late
**Flags:**
- `--archive` - Specific archive name to restore (e.g., sts-backup-20210216-0300.graph)
- `--latest` - Restore from the most recent backup
- `--force` - Force delete existing data during restore
- `--background` - Run restore job in background without waiting for completion
- `--yes, -y` - Skip confirmation prompt

**Note**: Either `--archive` or `--latest` must be specified (mutually exclusive).

#### check-and-finalize

Check the status of a background Stackgraph restore job and clean up resources.

```bash
sts-backup stackgraph check-and-finalize --namespace <namespace> --job <job-name> [--wait]
```

**Flags:**

- `--job, -j` - Stackgraph restore job name (required)
- `--wait, -w` - Wait for job to complete before cleanup

**Use Case**: This command is useful when a restore job was started with `--background` flag or was interrupted (
Ctrl+C).

### victoriametrics

Manage VictoriaMetrics backups and restores.

#### list

List available VictoriaMetrics backups from S3/Minio.

```bash
sts-backup victoriametrics list --namespace <namespace>
```

**Note**: In HA mode, backups from both instances (victoria-metrics-0 and victoria-metrics-1) are listed. The restore
command accepts either backup to restore both instances.

#### restore

Restore VictoriaMetrics from a backup archive. Automatically scales down affected StatefulSets before restore and scales
them back up afterward.

```bash
sts-backup victoriametrics restore --namespace <namespace> [--archive <name> | --latest] [flags]
```

**Flags:**

- `--archive` - Specific backup name to restore (e.g., sts-victoria-metrics-backup/victoria-metrics-0-20251030143500)
- `--latest` - Restore from the most recent backup
- `--background` - Run restore job in background without waiting for completion
- `--yes, -y` - Skip confirmation prompt

**Note**: Either `--archive` or `--latest` must be specified (mutually exclusive).

#### check-and-finalize

Check the status of a background VictoriaMetrics restore job and clean up resources.

```bash
sts-backup victoriametrics check-and-finalize --namespace <namespace> --job <job-name> [--wait]
```

**Flags:**

- `--job, -j` - VictoriaMetrics restore job name (required)
- `--wait, -w` - Wait for job to complete before cleanup

**Use Case**: This command is useful when a restore job was started with `--background` flag or was interrupted (
Ctrl+C).

## Configuration

The CLI uses configuration from Kubernetes ConfigMaps and Secrets with the following precedence:
Expand Down Expand Up @@ -194,9 +260,14 @@ See [internal/foundation/config/testdata/validConfigMapConfig.yaml](internal/fou
β”‚ β”‚ β”œβ”€β”€ list-indices.go # List indices
β”‚ β”‚ β”œβ”€β”€ list-snapshots.go # List snapshots
β”‚ β”‚ └── restore-snapshot.go # Restore snapshot
β”‚ └── stackgraph/ # Stackgraph subcommands
β”‚ β”œβ”€β”€ stackgraph/ # Stackgraph subcommands
β”‚ β”‚ β”œβ”€β”€ list.go # List backups
β”‚ β”‚ β”œβ”€β”€ restore.go # Restore backup
β”‚ β”‚ └── check-and-finalize.go # Check and finalize restore job
β”‚ └── victoriametrics/ # VictoriaMetrics subcommands
β”‚ β”œβ”€β”€ list.go # List backups
β”‚ └── restore.go # Restore backup
β”‚ β”œβ”€β”€ restore.go # Restore backup
β”‚ └── check-and-finalize.go # Check and finalize restore job
β”œβ”€β”€ internal/ # Internal packages (Layers 0-3)
β”‚ β”œβ”€β”€ foundation/ # Layer 0: Core utilities
β”‚ β”‚ β”œβ”€β”€ config/ # Configuration management
Expand All @@ -208,7 +279,12 @@ See [internal/foundation/config/testdata/validConfigMapConfig.yaml](internal/fou
β”‚ β”‚ └── s3/ # S3/Minio client
β”‚ β”œβ”€β”€ orchestration/ # Layer 2: Workflows
β”‚ β”‚ β”œβ”€β”€ portforward/ # Port-forwarding lifecycle
β”‚ β”‚ └── scale/ # Deployment scaling
β”‚ β”‚ β”œβ”€β”€ scale/ # Deployment/StatefulSet scaling
β”‚ β”‚ └── restore/ # Restore job orchestration
β”‚ β”‚ β”œβ”€β”€ confirmation.go # User confirmation prompts
β”‚ β”‚ β”œβ”€β”€ finalize.go # Job status check and cleanup
β”‚ β”‚ β”œβ”€β”€ job.go # Job lifecycle management
β”‚ β”‚ └── resources.go # Restore resource management
β”‚ β”œβ”€β”€ app/ # Layer 3: Dependency container
β”‚ β”‚ └── app.go # Application context and DI
β”‚ └── scripts/ # Embedded bash scripts
Expand Down
20 changes: 20 additions & 0 deletions cmd/elasticsearch/list_snapshots_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,26 @@ stackgraph:
memory: "2Gi"
pvc:
size: "10Gi"
victoriaMetrics:
S3Locations:
- bucket: vm-backup
prefix: victoria-metrics-0
- bucket: vm-backup
prefix: victoria-metrics-1
restore:
haMode: "mirror"
persistentVolumeClaimPrefix: "database-victoria-metrics-"
scaleDownLabelSelector: "app=victoria-metrics"
job:
image: vm-backup:latest
waitImage: wait:latest
resources:
limits:
cpu: "1"
memory: "2Gi"
requests:
cpu: "500m"
memory: "1Gi"
`

// mockESClient is a simple mock for testing commands
Expand Down
5 changes: 5 additions & 0 deletions cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import (
"github.com/stackvista/stackstate-backup-cli/cmd/elasticsearch"
"github.com/stackvista/stackstate-backup-cli/cmd/stackgraph"
"github.com/stackvista/stackstate-backup-cli/cmd/version"
"github.com/stackvista/stackstate-backup-cli/cmd/victoriametrics"
"github.com/stackvista/stackstate-backup-cli/internal/foundation/config"
)

Expand Down Expand Up @@ -39,6 +40,10 @@ func init() {
addBackupConfigFlags(stackgraphCmd)
rootCmd.AddCommand(stackgraphCmd)

victoriaMetricsCmd := victoriametrics.Cmd(flags)
addBackupConfigFlags(victoriaMetricsCmd)
rootCmd.AddCommand(victoriaMetricsCmd)

// Add commands that don't need backup config flags
rootCmd.AddCommand(version.Cmd())
}
Expand Down
Loading