-
Notifications
You must be signed in to change notification settings - Fork 46
feat: enable multi-arch builds with ARM compatibility #206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
derekhiggins
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long does a arm64 image build take? Building an arm64 image with podman locally has taken over an hour and is still running, are we going to hit timeouts in CI ? (or could it be a local setup problem)
Makefile
Outdated
| trap 'rm -f Dockerfile.podman.tmp' EXIT; \ | ||
| sed -e 's/^ARG BUILDPLATFORM=linux\/amd64/ARG BUILDPLATFORM/' \ | ||
| -e 's/^ARG TARGETPLATFORM=linux\/amd64/ARG TARGETPLATFORM/' \ | ||
| -e 's|\$${BUILDPLATFORM}|linux/arm64|g' \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For podman, --platform automatically sets BUILDPLATFORM and TARGETPLATFORM
If podman is setting BUILDPLATFORM do we still need to hard code it here (same question above in podman-buildx-multiarch)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a few more builds and was able to replicate some transient issues with the arm64 builds. I've narrowed it down to the inclusion of CGO_ENABLED=0 (FIPS related) on the build. The builds were either taking 5-10 minutes, or getting stuck entirely.
It doesn't happen with amd64 builds, and I have finally settled on an arm64 workaround that I still need to test further. I'll see if I can get access to some FIPS-enabled ARM clusters to test on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, that did improve things quite a bit. I now see all the PR tests pass within a reasonable amount of time. It looks like the longest one was the e2e tests, at 9m. I'll see if the ARM-specific CGO nuances can reflected in downstream documentation.
| shell: bash | ||
| run: | | ||
| make image-build IMG=quay.io/llamastack/llama-stack-k8s-operator:v${{ steps.validate.outputs.operator_version }} | ||
| make image-buildx IMG=quay.io/llamastack/llama-stack-k8s-operator:v${{ steps.validate.outputs.operator_version }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only want to build the image here as a part of the validation process, but not push it. From my understanding of the image-buildx rule that it also pushes the image. Would you create and use a Makefile rule here that encapsulates only the image building logic?
|
@mergify rebase |
Signed-off-by: Doug Edgar <dedgar@redhat.com>
✅ Branch has been successfully rebased |
3913c2f to
6da9254
Compare
|
This pull request has merge conflicts that must be resolved before it can be merged. @rhdedgar please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork |
Setup multi-arch builds with a manifest that will automatically point to the image of supported architectures.
On Kubernetes, with these changes, images are still referred to in the same manner, but will resolve to the architecture that is relevant to the node where it is to be run.
This should allow users to seamlessly transition between clusters with different architectures, once the distribution changes are also in place.
Closes: RHAIENG-1941