Skip to content

Conversation

@rhdedgar
Copy link
Collaborator

Setup multi-arch builds with a manifest that will automatically point to the image of supported architectures.

On Kubernetes, with these changes, images are still referred to in the same manner, but will resolve to the architecture that is relevant to the node where it is to be run.

                                                  --> quay.io/<user>/llama-stack-k8s-operator/<version>-linux-arm64
quay.io/<user>/llama-stack-k8s-operator/<version>-|
                                                  --> quay.io/<user>/llama-stack-k8s-operator/<version>-linux-amd64

This should allow users to seamlessly transition between clusters with different architectures, once the distribution changes are also in place.

Closes: RHAIENG-1941

Copy link
Collaborator

@derekhiggins derekhiggins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does a arm64 image build take? Building an arm64 image with podman locally has taken over an hour and is still running, are we going to hit timeouts in CI ? (or could it be a local setup problem)

Makefile Outdated
trap 'rm -f Dockerfile.podman.tmp' EXIT; \
sed -e 's/^ARG BUILDPLATFORM=linux\/amd64/ARG BUILDPLATFORM/' \
-e 's/^ARG TARGETPLATFORM=linux\/amd64/ARG TARGETPLATFORM/' \
-e 's|\$${BUILDPLATFORM}|linux/arm64|g' \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For podman, --platform automatically sets BUILDPLATFORM and TARGETPLATFORM

If podman is setting BUILDPLATFORM do we still need to hard code it here (same question above in podman-buildx-multiarch)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a few more builds and was able to replicate some transient issues with the arm64 builds. I've narrowed it down to the inclusion of CGO_ENABLED=0 (FIPS related) on the build. The builds were either taking 5-10 minutes, or getting stuck entirely.

It doesn't happen with amd64 builds, and I have finally settled on an arm64 workaround that I still need to test further. I'll see if I can get access to some FIPS-enabled ARM clusters to test on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, that did improve things quite a bit. I now see all the PR tests pass within a reasonable amount of time. It looks like the longest one was the e2e tests, at 9m. I'll see if the ARM-specific CGO nuances can reflected in downstream documentation.

shell: bash
run: |
make image-build IMG=quay.io/llamastack/llama-stack-k8s-operator:v${{ steps.validate.outputs.operator_version }}
make image-buildx IMG=quay.io/llamastack/llama-stack-k8s-operator:v${{ steps.validate.outputs.operator_version }}
Copy link
Collaborator

@mfleader mfleader Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only want to build the image here as a part of the validation process, but not push it. From my understanding of the image-buildx rule that it also pushes the image. Would you create and use a Makefile rule here that encapsulates only the image building logic?

@VaishnaviHire
Copy link
Collaborator

@mergify rebase

Signed-off-by: Doug Edgar <dedgar@redhat.com>
@mergify
Copy link

mergify bot commented Dec 5, 2025

rebase

✅ Branch has been successfully rebased

@mergify
Copy link

mergify bot commented Dec 5, 2025

This pull request has merge conflicts that must be resolved before it can be merged. @rhdedgar please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants