-
Notifications
You must be signed in to change notification settings - Fork 3
approval controller, metric collector controllers #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
e49b943 to
7764719
Compare
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a comprehensive solution for automating approval decisions in KubeFleet staged rollouts based on workload health metrics from Prometheus. The implementation adds two standalone controllers (approval-request-controller on hub, metric-collector on members) and four custom resources to enable automated staged rollout approvals.
Key Changes:
- Two standalone Kubernetes controllers for metric-based approval automation
- Four new CRDs for metric collection and workload tracking
- Complete documentation and installation scripts for both controllers
- Integration with KubeFleet v0.1.2 for staged update orchestration
Reviewed changes
Copilot reviewed 64 out of 67 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| approval-request-controller/go.mod | Module definition with invalid Go version 1.24.9 |
| approval-request-controller/pkg/controller/controller.go | Main approval logic that watches ApprovalRequests and auto-approves based on metrics |
| approval-request-controller/apis/metric/v1alpha1/*.go | Custom resource type definitions for MetricCollector, Reports, and WorkloadTrackers |
| metric-collector/go.mod | Module definition with invalid Go version 1.24.9 |
| metric-collector/pkg/controller/*.go | Member cluster controller for collecting Prometheus metrics |
| /docker/.Dockerfile | Container build files using invalid Go 1.24 base images |
| /install-on-.sh | Installation scripts for hub and member cluster deployments |
| /charts/ | Helm charts for deploying both controllers |
| /examples/ | Example configurations for Prometheus, CRPs, and workload trackers |
| README.md | Comprehensive tutorial covering setup, architecture, and usage |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...troller-metric-collector/approval-request-controller/examples/prometheus/prometheus-crp.yaml
Outdated
Show resolved
Hide resolved
approval-controller-metric-collector/metric-collector/install-on-member.sh
Outdated
Show resolved
Hide resolved
approval-controller-metric-collector/metric-collector/install-on-member.sh
Outdated
Show resolved
Hide resolved
...r-metric-collector/approval-request-controller/apis/metric/v1alpha1/metriccollector_types.go
Outdated
Show resolved
Hide resolved
...r-metric-collector/approval-request-controller/apis/metric/v1alpha1/metriccollector_types.go
Outdated
Show resolved
Hide resolved
...r-metric-collector/approval-request-controller/apis/metric/v1alpha1/metriccollector_types.go
Outdated
Show resolved
Hide resolved
approval-controller-metric-collector/metric-collector/pkg/controller/collector.go
Outdated
Show resolved
Hide resolved
...ic-collector/approval-request-controller/apis/metric/v1alpha1/metriccollectorreport_types.go
Outdated
Show resolved
Hide resolved
...r-metric-collector/approval-request-controller/apis/metric/v1alpha1/workloadtracker_types.go
Show resolved
Hide resolved
...r-metric-collector/approval-request-controller/apis/metric/v1alpha1/workloadtracker_types.go
Show resolved
Hide resolved
...st-controller/config/crd/bases/metric.kubernetes-fleet.io_clusterstagedworkloadtrackers.yaml
Show resolved
Hide resolved
approval-controller-metric-collector/metric-collector/pkg/controller/collector.go
Show resolved
Hide resolved
approval-controller-metric-collector/metric-collector/pkg/controller/collector.go
Show resolved
Hide resolved
...ontroller-metric-collector/approval-request-controller/cmd/approvalrequestcontroller/main.go
Outdated
Show resolved
Hide resolved
approval-controller-metric-collector/metric-collector/pkg/controller/controller.go
Outdated
Show resolved
Hide resolved
approval-controller-metric-collector/metric-collector/pkg/controller/controller.go
Outdated
Show resolved
Hide resolved
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
c9bb19b to
3c8db43
Compare
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
michaelawyu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some comments, PTAL
...oller-metric-collector/approval-request-controller/apis/metric/v1alpha1/groupversion_info.go
Outdated
Show resolved
Hide resolved
...r-metric-collector/approval-request-controller/apis/metric/v1alpha1/metriccollector_types.go
Outdated
Show resolved
Hide resolved
...r-metric-collector/approval-request-controller/apis/metric/v1alpha1/metriccollector_types.go
Outdated
Show resolved
Hide resolved
...r-metric-collector/approval-request-controller/apis/metric/v1alpha1/workloadtracker_types.go
Show resolved
Hide resolved
...uest-controller/templates/crds/metric.kubernetes-fleet.io_clusterstagedworkloadtrackers.yaml
Outdated
Show resolved
Hide resolved
approval-controller-metric-collector/metric-collector/cmd/metriccollector/main.go
Show resolved
Hide resolved
approval-controller-metric-collector/metric-collector/pkg/controller/collector.go
Outdated
Show resolved
Hide resolved
...r-metric-collector/approval-request-controller/apis/metric/v1alpha1/metriccollector_types.go
Outdated
Show resolved
Hide resolved
approval-controller-metric-collector/metric-collector/pkg/controller/controller.go
Outdated
Show resolved
Hide resolved
approval-controller-metric-collector/metric-collector/pkg/controller/controller.go
Outdated
Show resolved
Hide resolved
|
Hi Arvind! Just some of my two cents on the high level: a) Arch-wise the design seems to be a bit too complex: for example, the whole metric data passing process can be done easily with one API but now it uses two separate APIs + the CRP/override API to complete the job. |
|
b) I understand that it's demo code so we want to focus more on the showcasing side, and that's probably the reason why in the code the controller is basically expecting one static metric (gauge type) from the host cluster -> but if that's the case we should be quite straightforward about this in the code and in the doc, and the API should get greatly simplified. Alternatively we could allow users to specific custom queries, which would make the code more useful (and more complex, of course) |
|
c) the folder structure could use some work. I feel that an organization like our main repo would be more comprehensible; currently everything is a bit scattered (with soft links connecting the duplicates), e.g., the APIs are all kept on the approval controller part. Doc wise I fear that for users without enough context they might find it difficult to grasp what the demo is really for. |
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
1295b6f to
16367ff
Compare
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
This PR introduces a complete solution for automating approval decisions in KubeFleet staged rollouts based on workload health metrics from Prometheus.
What's Added:
Two Standalone Controllers:
Approval-Request-Controller (hub cluster): Watches ApprovalRequests/ClusterApprovalRequests, creates MetricCollectorReport resources directly in fleet-member-* namespaces on the hub, evaluates workload health, and auto-approves stages when all tracked workloads are healthy
Metric-Collector (member clusters): Connects to hub cluster to watch MetricCollectorReport in its fleet-member namespace, queries local Prometheus every 30 seconds for workload health metrics, and updates the report status on hub
Custom Resources:
MetricCollectorReport (hub cluster): Created by approval-request-controller in fleet-member-* namespaces, contains Prometheus URL spec and collected health metrics in status, updated by metric-collector running on member clusters
ClusterStagedWorkloadTracker: Specifies which workloads must be healthy before approving stages in ClusterStagedUpdateRun (cluster-scoped)
StagedWorkloadTracker: Specifies which workloads must be healthy before approving stages in StagedUpdateRun (namespace-scoped)
Architecture:
Approval-request-controller creates MetricCollectorReport resources on hub (no deployment to members)
Metric-collector on each member connects to hub using service account token
Simple token-based authentication with no certificate or CA verification
Approval controller checks health every 15 seconds; metric collector updates every 30 seconds
Build & Deployment:
Makefile with commands for building all three Docker images (approval-request-controller, metric-collector, metric-app)
Automated installation scripts that can be run from approval-request-metric-collector directory
Scripts handle service account creation, RBAC setup, and Helm deployment
Documentation:
Main tutorial with complete end-to-end setup guide including ACR setup
Controller-specific READMEs
Example configurations for Prometheus, staged updates, and workload tracking
Detailed architecture diagrams and flow explanations