STAC-23603: Restoring Settings #6

viliakov · 2025-11-18T08:48:22Z

This PR adds Settings (Configuration of SUSE Observability instance stored in Stackgraph) backup/restore functionality to the StackState Backup CLI.

New Commands

`settings list`

Lists available backups from Minio S3 storage.

Example:

❯ go run main.go settings list --namespace stac-23374-ha
Setting up port-forward to suse-observability-minio:9000 in namespace stac-23374-ha...
✓ Port-forward established successfully
Listing Settings backups in bucket 'sts-configuration-backup'...
NAME                          LAST MODIFIED            SIZE
sts-backup-20251118-0827.sty  2025-11-18 08:28:29 UTC  1MiB
sts-backup-20251117-1237.sty  2025-11-17 12:38:21 UTC  1MiB
sts-backup-20251117-1236.sty  2025-11-17 12:37:22 UTC  1MiB

`settings restore`

Restores a backup from Minio S3 storage with automatic deployment scaling and Kubernetes job
orchestration.

Restore Workflow

When you run stackgraph restore, the CLI performs the following steps:

Backup Selection
- If --latest flag is specified: Automatically fetches the most recent backup from Minio
- If --archive flag is specified: Uses the explicitly provided archive name
User Confirmation (unless --yes is used)
- Warns that restore will PURGE all existing Stackgraph data
- Displays backup file and namespace
- Prompts for confirmation: Do you want to continue? (yes/no):
- Use --yes or -y flag to skip confirmation (useful for automation)
Scale Down Deployments
- Identifies and scales down affected deployments to zero replicas
- Waits for all pods to terminate gracefully
- Stores original replica counts for restoration
Create Kubernetes Resources
- ConfigMap: Contains the restore script and configuration
- Secret: Mounts Minio credentials for S3 access
Job Execution (conditional)
- If --background flag is NOT set:
- Waits for the Kubernetes Job to complete
- Streams job logs to stdout in real-time
- Reports success/failure status
  - If --background flag IS set:
  - Creates the job and returns immediately
- User can monitor job status separately with kubectl
Scale Up Deployments
- Restores deployments to their original replica counts
- Automatically triggered after job completion (or immediately if --background is used)
Cleanup
- Kubernetes Job is automatically cleaned up via TTL (10 minutes after completion)
- PVC remains for troubleshooting and is cleaned up in the next restore

Usage

Usage:
sts-backup settings restore [flags]

Flags:
--archive string Specific archive name to restore (e.g., sts-backup-20251117-1404.sty)
--background Run restore job in background without waiting for completion
-h, --help help for restore
--latest Restore from the most recent backup
-y, --yes Skip confirmation prompt

Example 1: Restore with Background Execution (Interactive).

❯ go run main.go settings restore --namespace stac-23374-ha --archive sts-backup-20251117-1237.sty --background

Warning: WARNING: Restoring from backup will PURGE all existing Stackgraph (Topology) data!
Warning: This operation cannot be undone.

Backup to restore: sts-backup-20251117-1237.sty
Namespace: stac-23374-ha

Do you want to continue? (yes/no): yes

Scaling down deployments (selector: stackstate.com/connects-to-stackgraph=true)...
✓ Scaled down 9 deployment(s):
  - suse-observability-api (replicas: 1 -> 0)
  - suse-observability-authorization-sync (replicas: 1 -> 0)
  - suse-observability-checks (replicas: 1 -> 0)
  - suse-observability-health-sync (replicas: 1 -> 0)
  - suse-observability-initializer (replicas: 1 -> 0)
  - suse-observability-notification (replicas: 1 -> 0)
  - suse-observability-slicing (replicas: 1 -> 0)
  - suse-observability-state (replicas: 1 -> 0)
  - suse-observability-sync (replicas: 1 -> 0)
✓ Scaled down 0 statefulsets(s):
Waiting for pods to terminate...
Waiting for 2 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
✓ All pods have terminated

Ensuring backup scripts ConfigMap exists...
✓ Backup scripts ConfigMap ready
Ensuring Minio keys secret exists...
✓ Minio keys secret ready

Creating restore job for backup: sts-backup-20251117-1237.sty
✓ Restore job created: settings-restore-20251118t093301

Job is running in background: settings-restore-20251118t093301

Monitoring commands:
  kubectl logs --follow job/settings-restore-20251118t093301 -n stac-23374-ha
  kubectl get job settings-restore-20251118t093301 -n stac-23374-ha

To wait for completion, scaling up the necessary deployments and cleanup, run:
  sts-backup settings check-and-finalize --job settings-restore-20251118t093301 --wait -n stac-23374-ha

Example 2: Restore Latest Backup with automatic approval

❯ go run main.go settings restore --namespace stac-23374-nonha --latest --yes
Finding latest backup...
Setting up port-forward to suse-observability-minio:9000 in namespace stac-23374-nonha...
✓ Port-forward established successfully
Using latest backup: sts-backup-20251117-1404.sty

Scaling down deployments (selector: stackstate.com/connects-to-stackgraph=true)...
✓ Scaled down 1 deployment(s):
  - suse-observability-server (replicas: 1 -> 0)
✓ Scaled down 0 statefulsets(s):
Waiting for pods to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
Waiting for 1 pod(s) to terminate...
✓ All pods have terminated

Ensuring backup scripts ConfigMap exists...
✓ Backup scripts ConfigMap ready
Ensuring Minio keys secret exists...
✓ Minio keys secret ready

Creating restore job for backup: sts-backup-20251117-1404.sty
✓ Restore job created: settings-restore-20251118t093651

Waiting for restore job to complete (this may take significant amount of time depending on the archive size)...

You can safely interrupt this command with Ctrl+C.
To check status, scale up the required deployments and cleanup later, run:
  sts-backup settings check-and-finalize --job settings-restore-20251118t093651 --wait -n stac-23374-nonha

✓ Restore completed successfully

Cleaning up resources...
✓ Job deleted: settings-restore-20251118t093651
Warning: Failed to delete PVC: persistentvolumeclaims "settings-restore-20251118t093651" not found

Scaling up deployments from annotations (selector: stackstate.com/connects-to-stackgraph=true)...
✓ Scaled up 1 deployment(s) successfully:
  - suse-observability-server (replicas: 0 -> 1)
✓ Scaled up 0 statefulset(s) successfully:

`settings check-and-finalize`

Check the status of a background restore job and clean up resources.

Usage
sts-backup settings check-and-finalize --job [--wait] -n

Flags:

--job, -j - Settings restore job name (required)
--wait, -w - Wait for job to complete before cleanup

Note: This command automatically scales up deployments that were scaled down during restore.

Example: Checking if the job is still running

❯ go run main.go settings check-and-finalize --job settings-restore-20251118t094238 -n stac-23374-ha
Checking status of job: settings-restore-20251118t094238

Job is running in background: settings-restore-20251118t094238
  Active pods: 1

Monitoring commands:
  kubectl logs --follow job/settings-restore-20251118t094238 -n stac-23374-ha
  kubectl get job settings-restore-20251118t094238 -n stac-23374-ha

To wait for completion, scaling up the necessary deployments and cleanup, run:
  sts-backup settings check-and-finalize --job settings-restore-20251118t094238 --wait -n stac-23374-ha

Example: Waiting for the job to finish

❯ go run main.go settings check-and-finalize --job settings-restore-20251118t094647 --wait -n stac-23374-nonha
Checking status of job: settings-restore-20251118t094647

Waiting for restore job to complete (this may take significant amount of time depending on the archive size)...

You can safely interrupt this command with Ctrl+C.
To check status, scale up the required deployments and cleanup later, run:
  sts-backup settings check-and-finalize --job settings-restore-20251118t094647 --wait -n stac-23374-nonha

✓ Job completed successfully: settings-restore-20251118t094647

Scaling up deployments from annotations (selector: stackstate.com/connects-to-stackgraph=true)...
✓ Scaled up 1 deployment(s) successfully:
  - suse-observability-server (replicas: 0 -> 1)
✓ Scaled up 0 statefulset(s) successfully:

Cleaning up resources...
✓ Job deleted: settings-restore-20251118t094647

VioletCranberry

✅

STAC-23603: Restoring Settings

836990c

VioletCranberry approved these changes Nov 19, 2025

View reviewed changes

viliakov merged commit 8e3e7b5 into main Nov 19, 2025
5 checks passed

viliakov deleted the STAC-23603 branch November 19, 2025 09:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

STAC-23603: Restoring Settings #6

STAC-23603: Restoring Settings #6

Uh oh!

viliakov commented Nov 18, 2025

Uh oh!

VioletCranberry left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

STAC-23603: Restoring Settings #6

STAC-23603: Restoring Settings #6

Uh oh!

Conversation

viliakov commented Nov 18, 2025

New Commands

settings list

settings restore

settings check-and-finalize

Uh oh!

VioletCranberry left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`settings list`

`settings restore`

`settings check-and-finalize`