Skip to content

Conversation

@viliakov
Copy link
Contributor

This PR adds Settings (Configuration of SUSE Observability instance stored in Stackgraph) backup/restore functionality to the StackState Backup CLI.

New Commands

settings list

Lists available backups from Minio S3 storage.

Example:

❯ go run main.go settings list --namespace stac-23374-ha
Setting up port-forward to suse-observability-minio:9000 in namespace stac-23374-ha...
✓ Port-forward established successfully
Listing Settings backups in bucket 'sts-configuration-backup'...
NAME                          LAST MODIFIED            SIZE
sts-backup-20251118-0827.sty  2025-11-18 08:28:29 UTC  1MiB
sts-backup-20251117-1237.sty  2025-11-17 12:38:21 UTC  1MiB
sts-backup-20251117-1236.sty  2025-11-17 12:37:22 UTC  1MiB

settings restore

Restores a backup from Minio S3 storage with automatic deployment scaling and Kubernetes job
orchestration.

Restore Workflow

When you run stackgraph restore, the CLI performs the following steps:

  1. Backup Selection
    - If --latest flag is specified: Automatically fetches the most recent backup from Minio
    - If --archive flag is specified: Uses the explicitly provided archive name
  2. User Confirmation (unless --yes is used)
    - Warns that restore will PURGE all existing Stackgraph data
    - Displays backup file and namespace
    - Prompts for confirmation: Do you want to continue? (yes/no):
    - Use --yes or -y flag to skip confirmation (useful for automation)
  3. Scale Down Deployments
    - Identifies and scales down affected deployments to zero replicas
    - Waits for all pods to terminate gracefully
    - Stores original replica counts for restoration
  4. Create Kubernetes Resources
    - ConfigMap: Contains the restore script and configuration
    - Secret: Mounts Minio credentials for S3 access
  5. Job Execution (conditional)
    - If --background flag is NOT set:
    • Waits for the Kubernetes Job to complete
    • Streams job logs to stdout in real-time
    • Reports success/failure status
      - If --background flag IS set:
      • Creates the job and returns immediately
    • User can monitor job status separately with kubectl
  6. Scale Up Deployments
    - Restores deployments to their original replica counts
    - Automatically triggered after job completion (or immediately if --background is used)
  7. Cleanup
    - Kubernetes Job is automatically cleaned up via TTL (10 minutes after completion)
    - PVC remains for troubleshooting and is cleaned up in the next restore

Usage

Usage:
sts-backup settings restore [flags]

Flags:
--archive string Specific archive name to restore (e.g., sts-backup-20251117-1404.sty)
--background Run restore job in background without waiting for completion
-h, --help help for restore
--latest Restore from the most recent backup
-y, --yes Skip confirmation prompt

Example 1: Restore with Background Execution (Interactive).

❯ go run main.go settings restore --namespace stac-23374-ha --archive sts-backup-20251117-1237.sty --background

Warning: WARNING: Restoring from backup will PURGE all existing Stackgraph (Topology) data!
Warning: This operation cannot be undone.

Backup to restore: sts-backup-20251117-1237.sty
Namespace: stac-23374-ha

Do you want to continue? (yes/no): yes

Scaling down deployments (selector: stackstate.com/connects-to-stackgraph=true)...
✓ Scaled down 9 deployment(s):
  - suse-observability-api (replicas: 1 -> 0)
  - suse-observability-authorization-sync (replicas: 1 -> 0)
  - suse-observability-checks (replicas: 1 -> 0)
  - suse-observability-health-sync (replicas: 1 -> 0)
  - suse-observability-initializer (replicas: 1 -> 0)
  - suse-observability-notification (replicas: 1 -> 0)
  - suse-observability-slicing (replicas: 1 -> 0)
  - suse-observability-state (replicas: 1 -> 0)
  - suse-observability-sync (replicas: 1 -> 0)
✓ Scaled down 0 statefulsets(s):
Waiting for pods to terminate...
Waiting for 2 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
✓ All pods have terminated

Ensuring backup scripts ConfigMap exists...
✓ Backup scripts ConfigMap ready
Ensuring Minio keys secret exists...
✓ Minio keys secret ready

Creating restore job for backup: sts-backup-20251117-1237.sty
✓ Restore job created: settings-restore-20251118t093301

Job is running in background: settings-restore-20251118t093301

Monitoring commands:
  kubectl logs --follow job/settings-restore-20251118t093301 -n stac-23374-ha
  kubectl get job settings-restore-20251118t093301 -n stac-23374-ha

To wait for completion, scaling up the necessary deployments and cleanup, run:
  sts-backup settings check-and-finalize --job settings-restore-20251118t093301 --wait -n stac-23374-ha

Example 2: Restore Latest Backup with automatic approval

❯ go run main.go settings restore --namespace stac-23374-nonha --latest --yes
Finding latest backup...
Setting up port-forward to suse-observability-minio:9000 in namespace stac-23374-nonha...
✓ Port-forward established successfully
Using latest backup: sts-backup-20251117-1404.sty

Scaling down deployments (selector: stackstate.com/connects-to-stackgraph=true)...
✓ Scaled down 1 deployment(s):
  - suse-observability-server (replicas: 1 -> 0)
✓ Scaled down 0 statefulsets(s):
Waiting for pods to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
✓ All pods have terminated

Ensuring backup scripts ConfigMap exists...
✓ Backup scripts ConfigMap ready
Ensuring Minio keys secret exists...
✓ Minio keys secret ready

Creating restore job for backup: sts-backup-20251117-1404.sty
✓ Restore job created: settings-restore-20251118t093651

Waiting for restore job to complete (this may take significant amount of time depending on the archive size)...

You can safely interrupt this command with Ctrl+C.
To check status, scale up the required deployments and cleanup later, run:
  sts-backup settings check-and-finalize --job settings-restore-20251118t093651 --wait -n stac-23374-nonha

✓ Restore completed successfully

Cleaning up resources...
✓ Job deleted: settings-restore-20251118t093651
Warning: Failed to delete PVC: persistentvolumeclaims "settings-restore-20251118t093651" not found

Scaling up deployments from annotations (selector: stackstate.com/connects-to-stackgraph=true)...
✓ Scaled up 1 deployment(s) successfully:
  - suse-observability-server (replicas: 0 -> 1)
✓ Scaled up 0 statefulset(s) successfully:

settings check-and-finalize

Check the status of a background restore job and clean up resources.

Usage
sts-backup settings check-and-finalize --job [--wait] -n

Flags:

  • --job, -j - Settings restore job name (required)
  • --wait, -w - Wait for job to complete before cleanup

Note: This command automatically scales up deployments that were scaled down during restore.

Example: Checking if the job is still running

❯ go run main.go settings check-and-finalize --job settings-restore-20251118t094238 -n stac-23374-ha
Checking status of job: settings-restore-20251118t094238

Job is running in background: settings-restore-20251118t094238
  Active pods: 1

Monitoring commands:
  kubectl logs --follow job/settings-restore-20251118t094238 -n stac-23374-ha
  kubectl get job settings-restore-20251118t094238 -n stac-23374-ha

To wait for completion, scaling up the necessary deployments and cleanup, run:
  sts-backup settings check-and-finalize --job settings-restore-20251118t094238 --wait -n stac-23374-ha

Example: Waiting for the job to finish

❯ go run main.go settings check-and-finalize --job settings-restore-20251118t094647 --wait -n stac-23374-nonha
Checking status of job: settings-restore-20251118t094647

Waiting for restore job to complete (this may take significant amount of time depending on the archive size)...

You can safely interrupt this command with Ctrl+C.
To check status, scale up the required deployments and cleanup later, run:
  sts-backup settings check-and-finalize --job settings-restore-20251118t094647 --wait -n stac-23374-nonha

✓ Job completed successfully: settings-restore-20251118t094647

Scaling up deployments from annotations (selector: stackstate.com/connects-to-stackgraph=true)...
✓ Scaled up 1 deployment(s) successfully:
  - suse-observability-server (replicas: 0 -> 1)
✓ Scaled up 0 statefulset(s) successfully:

Cleaning up resources...
✓ Job deleted: settings-restore-20251118t094647

Copy link

@VioletCranberry VioletCranberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viliakov viliakov merged commit 8e3e7b5 into main Nov 19, 2025
5 checks passed
@viliakov viliakov deleted the STAC-23603 branch November 19, 2025 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants