Skip to content

Conversation

@szuecs
Copy link
Member

@szuecs szuecs commented Oct 15, 2025

feature: zalando.org/forward-backend annotation support to enable migration to eks

Usage:

apiVersion: zalando.org/v1
kind: StackSet
metadata:
  annotations:
    zalando.org/forward-backend: eks migration # the value does not matter
..

This will execute the migration preparation, such that the next traffic switch will send the traffic to the forward backend.

@szuecs
Copy link
Member Author

szuecs commented Nov 6, 2025

changes should be moved to the GenerateRouteGroup/GenerateDeployment etc. functions to fit into the current model

…ration to eks

Usage:

```
apiVersion: zalando.org/v1
kind: StackSet
metadata:
  annotations:
    zalando.org/forward-backend: eks migration
..
```

This will execute the migration preparation, such that the next traffic switch will send the traffic to the forward backend.

Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
…gration, too

Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
feature: reduce resource waste by setting replicas to 1 for the new unused stack, that will not receive any traffic

Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
… the future why we scale down to 1

fix: ExternalIngress has also to get the anntoation to be set, such that it can execute a migration as expected

Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
@szuecs szuecs force-pushed the feature/annotation-migration-support branch from a11ccdc to ae8b458 Compare November 6, 2025 18:55
Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
// GenerateHPA generates a hpa as configured in the
// stack. On cluster migrations set by stackset annotation
// "zalando.org/forward-backend", the hpa will be set to
// minReplicas = maxReplicass = 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just skip generating an HPA in this case, this is similar to if sc.ScaleDown() case below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fear about removing objects we don't need is that we create alerts for the users and they overreact.
It's also not checking for traffic as we do in ScaleDown()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We anyway have to document how this feature works and what to expect.

They could also "overreact" because of minReplicas/maxReplicas = 1 :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think now?
I dropped HPA and deployment to not waste money.

Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
@szuecs szuecs added the major label Nov 7, 2025
@szuecs
Copy link
Member Author

szuecs commented Nov 7, 2025

not sure if this is major or minor, I tend to minor but the number of files of the change feels a bit more than minor.

Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
@szuecs
Copy link
Member Author

szuecs commented Nov 17, 2025

The current PR crashes with recovered panics at runtime.
I think stackset-controller wants to have a deployment.

time="2025-11-17T20:27:34Z" level=info msg="Event(v1.ObjectReference{Kind:\"StackSet\", Namespace:\"default\", Name:\"test-migration-app\", UID:\"11a0c501-1333-4c7a-bb8a-3b58eb68156e\", APIVersion:\"zalando.org/v1\", ResourceVersion:\"2692037662\", FieldPath:\"\"}): type: 'Warning' reason: 'FailedManageStackSet' panic: runtime error: invalid memory address or nil pointer dereference"
time="2025-11-17T20:27:43Z" level=error msg="Encountered a panic while processing a stackset: runtime error: invalid memory address or nil pointer dereference\ngoroutine 153 [running]:\nruntime/debug.Stack()\n\t/usr/local/go/src/runtime/debug/stack.go:26 +0x5e\ngithub.com/zalando-incubator/stackset-controller/controller.(*StackSetController).ReconcileStackSet.func1()\n\t/workspace/controller/stackset.go:1089 +0x1e5\npanic({0x1c556c0?, 0x32f91b0?})\n\t/usr/local/go/src/runtime/panic.go:783 +0x132\ngithub.com/zalando-incubator/stackset-controller/controller.(*StackSetController).ReconcileStackDeployment(0xc000000540, {0x2237130, 0xc000109b30}, 0xc000245808, 0x0, 0x0?)\n\t/workspace/controller/stack_resources.go:49 +0x3c1\ngithub.com/zalando-incubator/stackset-controller/controller.(*StackSetController).ReconcileStackResources(0xc000000540, {0x2237130, 0xc000109b30}, 0xc0004aee80?, 0xc0000c7ba0)\n\t/workspace/controller/stackset.go:1065 +0x265\ngithub.com/zalando-incubator/stackset-controller/controller.(*StackSetController).ReconcileStackSet(0xc000000540, {0x2237130, 0xc000109b30}, 0xc0004aee80)\n\t/workspace/controller/stackset.go:1155 +0xb29\ngithub.com/zalando-incubator/stackset-controller/controller.(*StackSetController).Run.func2()\n\t/workspace/controller/stackset.go:188 +0xa5\ngolang.org/x/sync/errgroup.(*Group).Go.func1()\n\t/go/pkg/mod/golang.org/x/sync@v0.17.0/errgroup/errgroup.go:93 +0x50\ncreated by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1\n\t/go/pkg/mod/golang.org/x/sync@v0.17.0/errgroup/errgroup.go:78 +0x95\n" controller=stackset namespace=default stackset=test-migration-app
time="2025-11-17T20:27:43Z" level=error msg="unable to reconcile a stackset: panic: runtime error: invalid memory address or nil pointer dereference" controller=stackset namespace=default stackset=test-migration-app
time="2025-11-17T20:27:43Z" level=error msg="Failed waiting for reconcilers: panic: runtime error: invalid memory address or nil pointer dereference" controller=stackset
time="2025-11-17T20:27:43Z" level=info msg="Event(v1.ObjectReference{Kind:\"StackSet\", Namespace:\"default\", Name:\"test-migration-app\", UID:\"11a0c501-1333-4c7a-bb8a-3b58eb68156e\", APIVersion:\"zalando.org/v1\", ResourceVersion:\"2692037662\", FieldPath:\"\"}): type: 'Warning' reason: 'FailedManageStackSet' panic: runtime error: invalid memory address or nil pointer dereference"

This was fixed by the commit after the comment

Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
@zalando-robot
Copy link

Deployment Checklist

This change falls under the deployment policy.

💁 Since Nov 10th, we are in the RED deployment zone. This means all changes released to production must adhere to the following requirements:

  • Detailed release notes are provided in this PR’s description.
  • Thorough load-testing has been performed, and is documented in the description/comment.
  • You can enable/disable the change via feature toggles, and have confirmed these toggles work as expected.
  • Technical review: A Principal Engineer, Engineering Manager or Head of Engineering have green-lit your changes, and the reviewer is named in the description/comments.
  • Application Owner (Director+) approval is given about the PR, and the approver is named in the description/comments.

👉 Regardless of which boxes you click in this comment, merge/deployment will not be blocked.
Reports about deployment policy adherence will be circulated daily.

Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
…cluster migration patched routegroups

Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
Signed-off-by: Sandor Szücs <sandor.szuecs@zalando.de>
@szuecs szuecs requested review from linki and mikkeloscar November 20, 2025 17:17
@szuecs
Copy link
Member Author

szuecs commented Nov 20, 2025

I am done with this PR form my side. It was tested in pet cluster with zkubectl traffic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants