-
Notifications
You must be signed in to change notification settings - Fork 1.4k
📖 docs: machinepool contract spec #12971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
📖 docs: machinepool contract spec #12971
Conversation
d1d60f8 to
271c8df
Compare
271c8df to
5cc7517
Compare
This adds documentation that details the contract for providers when implementing an infrastructure machine pool. This has been created retrospectively from looking at a number of providers and the MachinePool controller. Signed-off-by: Richard Case <richard.case@outlook.com>
5cc7517 to
53b1241
Compare
fabriziopandini
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR!
This really helps folks like me to step up knowledge about MachinePools!
docs/book/src/developer/providers/contracts/infra-machinepool.md
Outdated
Show resolved
Hide resolved
docs/book/src/developer/providers/contracts/infra-machinepool.md
Outdated
Show resolved
Hide resolved
docs/book/src/developer/providers/contracts/infra-machinepool.md
Outdated
Show resolved
Hide resolved
docs/book/src/developer/providers/contracts/infra-machinepool.md
Outdated
Show resolved
Hide resolved
docs/book/src/developer/providers/contracts/infra-machinepool.md
Outdated
Show resolved
Hide resolved
docs/book/src/developer/providers/contracts/infra-machinepool.md
Outdated
Show resolved
Hide resolved
|
|
||
| The value from this field is surfaced via the MachinePool's `status.replicas` field. | ||
|
|
||
| ### InfraMachinePool: terminal failures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should align ASAP to other CAPI resources WRT to terminal failures (deprecated, not anymore relevant for the controller)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and slightly changed the wording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in this case, I suggest to try to catch up with the rest of CAPI, otherwise it will be confusing for users when we have to explain e.g. "failureReason is gone everywhere but in MP things are different..."
docs/book/src/developer/providers/contracts/infra-machinepool.md
Outdated
Show resolved
Hide resolved
docs/book/src/developer/providers/contracts/infra-machinepool.md
Outdated
Show resolved
Hide resolved
|
/assign Would like to review after Fabrizio lgtm and before merge |
|
Thanks for your feedback @fabriziopandini . I will make updates based on this. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Changes after the first review by Fabrizio. Signed-off-by: Richard Case <richard.case@outlook.com>
cbf9bc7 to
22f729e
Compare
|
@fabriziopandini - i have updated the doc based on your feedback. |
docs/book/src/developer/providers/contracts/infra-machinepool.md
Outdated
Show resolved
Hide resolved
|
|
||
| The goal of an InfraMachinePool is to manage the lifecycle of a provider-specific pool of machines using a provider specific service (like auto-scale groups in AWS & virtual machine scalesets in Azure). | ||
|
|
||
| The machines in the pool may be physical or virtual instances (although most likely virtual), and they represent the infrastructure for Kubernetes nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The machines in the pool may be physical or virtual instances (although most likely virtual), and they represent the infrastructure for Kubernetes nodes. | |
| The machines in the pool may be physical or virtual instances, and they represent the infrastructure for Kubernetes nodes. |
(comment is not relevant for the contract)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theres a few lines in this document that are not directly relevant to the contract, but which instead provide background or hints. So i'm inclined to keep this as is.
|
|
||
| The [MachinePool's controller](../../core/controllers/machine-pool.md) is responsible to coordinate operations of the InfraMachinePool, and the interaction between the MachinePool's controller and the InfraMachinePool is based on the contract rules defined in this page. | ||
|
|
||
| Once contract rules are satisfied by an InfraMachinePool implementation, other implementation details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this sentence mean? Can it be removed? Should we explicitly mention optional features such as single machine deletion here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence needs to stay as it's stating that the MachinePool controller coordinates with the InfraMachinePool and that this interaction is governed by the contract.
Its also consistent with the contract docs for InfraMachine: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/book/src/developer/providers/contracts/infra-machine.md?plain=1#L10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we explicitly mention optional features such as single machine deletion here?
Is there something that is exposed via the contract that is relevant to this feature? There are general features of MachinePools that are not covered here (and should be covered in the general MachinePool docs IMHO).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to reword the sentence somehow, since I don't get the point that you mentioned for InfraMachine when reading this one.
Regarding single machine deletion – if you think it's out of scope here, maybe we can indeed then document it elsewhere as it's fully optional and has no implication on the contract IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line originally is based on something from the Machine contract. But i will reword to make it more clear, having the feedback that it snot clear what point is being made is really helpful. Thanks @AndiDog , i will try and get that rewording done today.
docs/book/src/developer/providers/contracts/infra-machinepool.md
Outdated
Show resolved
Hide resolved
docs/book/src/developer/providers/contracts/infra-machinepool.md
Outdated
Show resolved
Hide resolved
docs/book/src/developer/providers/contracts/infra-machinepool.md
Outdated
Show resolved
Hide resolved
docs/book/src/developer/providers/contracts/infra-machinepool.md
Outdated
Show resolved
Hide resolved
docs/book/src/developer/providers/contracts/infra-machinepool.md
Outdated
Show resolved
Hide resolved
docs/book/src/developer/providers/contracts/infra-machinepool.md
Outdated
Show resolved
Hide resolved
|
@richardcase are we not going to document anything on the upgrades? My understanding is that the status would change based on the update strategy. For example: Atomic updates:
Rolling updates:
Should providers declare their update strategy in the contract so CAPI can adapt behavior accordingly? |
Changes after the first review by Fabrizio. Signed-off-by: Richard Case <richard.case@outlook.com>
Some updates after an additional review by Andreas. Signed-off-by: Richard Case <richard.case@outlook.com>
@bnallapeta - i think we should follow up on this as we are trying to document the current state of the contract. Providers declaring an "update strategy" would be a change to the current contract. We are going to need to update the MachinePool controller document based on the introduction of this contract doc. Perhaps we should include behaviour type stuff when we do that? |
|
@richardcase quoting from #10496,
I think we should talk about this in the contract. A few questions on this: Should providers watch for bootstrap config changes and trigger updates? If yes, what's the signal? Hash of the secret data? ConfigRef version? |
|
Note for the maintainers, i will squash the commits when we are all happy with the doc. |
I agree i do think we need to document this in the MachinPools documentation. However, i don't think this sits in the contract document. Lets add this elsewhere, perhaps where i suggested earlier: https://cluster-api.sigs.k8s.io/developer/core/controllers/machine-pool when we make changes to that. |
no need to /label tide/merge-method-squash |
fabriziopandini
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to, but I think we should take a bolder step with regards to moving to the new contract, otherwise it will be confusing for users and complicated for us to handle
|
|
||
| A provider can opt-in to MachinePool Machines (MPM). With MPM machines all the replicas in a MachinePool are represented by a Machine & InfraMachine. This enables core CAPI to perform common operations on single machines (and their Nodes), such as draining a node before scale down, integration with Cluster Autoscaler and also [MachineHealthChecks]. | ||
|
|
||
| If you want to adopt MPM then you MUST have an `status.infrastructureMachineKind` field and the field must contain the resource kind that represents the replicas in the pool. This is usually named InfraMachine if machine pool machines are representable like regular machines, or InfraMachinePoolMachine in other cases. For example, for the AWS provider the value would be set to `AWSMachine`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure I grasp the meaning of
This is usually named InfraMachine if machine pool machines are representable like regular machines, or InfraMachinePoolMachine in other cases.
| Currently this is done by setting `status.ready` to **true**. The value returned here is stored in the MachinePool's `status.infraStructureReady` field. | ||
|
|
||
| Additionally providers should set `initialization.provisioned` to **true**. This value isn't currently used by core CAPI for MachinePools. However, MachinePools will start to use this instead and `status.ready` will be deprecated. By setting both these fields it will make the future migration easier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure asking providers to have both provides benefit, e.g. for other contract we ask to have one, depending on the contract they declare the provider is supporting.
Also, do you have a timeline in mind for for this to happen?
(for sake of simplicity I would prefer if there is only one way to transition from one contract version to another; similarly it will be easier to manage if we can align the timeline to v1beta1 removal)
|
|
||
| The value from this field is surfaced via the MachinePool's `status.replicas` field. | ||
|
|
||
| ### InfraMachinePool: terminal failures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in this case, I suggest to try to catch up with the rest of CAPI, otherwise it will be confusing for users when we have to explain e.g. "failureReason is gone everywhere but in MP things are different..."
What this PR does / why we need it:
This adds documentation that details the contract for providers when implementing an infrastructure machine pool.
This has been created retrospectively from looking at a number of providers and the MachinePool controller.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #12799
/area machinepool