STAC-23599: Restoring VictoriaMetrics #5
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds VictoriaMetrics backup/restore functionality and introduces a new
internal/orchestration/restore/package that eliminates code duplication between Stackgraph,VictoriaMetrics and potentially soon coming Clickhouse and Configuration restore operations.
Architecture Improvements
Code Deduplication
This PR extracts common restore patterns into a reusable orchestration
layer:
internal/orchestration/restore/: New package containing shared restore operationsconfirmation.go: User confirmation promptsjob.go: Kubernetes Job lifecycle management (wait, monitor, logs)finalize.go: Background job status checking and cleanup orchestrationresources.go: ConfigMap and Secret resource managementStatefulSet Scaling Support
Extended the
internal/orchestration/scale/package to support both Deployments and StatefulSetsthrough a unified interface, enabling VictoriaMetrics StatefulSet scaling during restore
operations.
📖 Updated architecture documentation in ARCHITECTURE.md and
README.md
New Commands
victoriametrics listLists available VictoriaMetrics backups from Minio S3 storage.
Examples:
List backups for a single-node VM setup
List backups for a HA VM setup (mirroring by vmagent)
victoriametrics restore
Restores VictoriaMetrics from a backup archive with automatic StatefulSet scaling and Kubernetes
job orchestration.
Restore Workflow
- --latest flag: Automatically fetches the most recent backup
- --archive flag: Uses the explicitly provided backup name
- Warns that restore will PURGE all existing VictoriaMetrics data
- Displays backup file and namespace
- Prompts: Do you want to continue? (yes/no):
- Scales down affected StatefulSets to zero replicas
- Waits for all pods to terminate gracefully
- Stores original replica counts in annotations
- ConfigMap: Contains restore script
- Secret: Mounts Minio credentials
- Job: Executes restore in containers (one per HA instance)
- Without --background: Waits for completion and streams logs
- With --background: Returns immediately for monitoring separately
- Restores StatefulSets to original replica counts
- Triggered after job completion (or immediately with --background)
- Job is automatically cleaned up via TTL (24 hours after completion)
Usage:
sts-backup victoriametrics restore [flags]
Flags:
--archive string Specific backup name to restore (e.g.,
sts-victoria-metrics-backup/victoria-metrics-0-20251030143500)
--background Run restore job in background without waiting for completion
--latest Restore from the most recent backup
-y, --yes Skip confirmation prompt
Example 1: Restore Latest Backup
sts-backup victoriametrics restore --namespace --latest --yes
Example 2: Restore Specific Backup in Background
sts-backup victoriametrics restore --namespace
--archive sts-victoria-metrics-backup/victoria-metrics-0-20251030143500
--background
Examples:
Restoring the latest available VM backup with auto-confirmation
Running restore operation in background
victoriametrics check-and-finalize
Checks the status of a background VictoriaMetrics restore job and cleans up resources.
Usage:
sts-backup victoriametrics check-and-finalize --job [--wait] -n
Flags:
-j, --job string VictoriaMetrics restore job name (required)
-w, --wait Wait for job to complete before cleanup
Note: This command automatically scales up StatefulSets that were scaled down during restore.
Example: Check Job Status
sts-backup victoriametrics check-and-finalize
--job victoriametrics-restore-20251104t143000
-n
Example: Wait for Completion and Cleanup
sts-backup victoriametrics check-and-finalize
--job victoriametrics-restore-20251104t143000
--wait
-n
Examples:
Waiting for the restore job to complete and cleaning up, scaling after that
Stackgraph Updates
The stackgraph commands now also benefit from the shared orchestration layer. The
check-and-finalize command was refactored to use the same orchestration functions as
VictoriaMetrics, ensuring consistent behavior across services.