@@ -6,7 +6,7 @@ This tutorial demonstrates how to use the Approval Request Controller and Metric
66
77This directory contains two controllers:
88- ** approval-request-controller** : Runs on the hub cluster to automate approval decisions for staged updates
9- - ** metric-collector** : Runs on member clusters to collect workload health metrics from Prometheus
9+ - ** metric-collector** : Runs on member clusters to collect and report workload health metrics
1010
1111![ Approval Controller and Metric Collector Architecture] ( ./images/approval-controller-metric-collector.png )
1212
@@ -18,24 +18,19 @@ This solution introduces three new CRDs that work together with KubeFleet's nati
1818
1919#### Hub Cluster CRDs
2020
21- 1 . ** MetricCollector** (cluster-scoped)
22- - Defines Prometheus connection details and where to report metrics
23- - Gets propagated to member clusters via ClusterResourcePlacement (CRP)
24- - Each member cluster receives a customized version with its specific ` reportNamespace `
21+ 1 . ** MetricCollectorReport** (namespaced)
22+ - Created by approval-request-controller in ` fleet-member-<cluster-name> ` namespaces on hub
23+ - Watched and updated by metric-collector running on member clusters
24+ - Contains specification of Prometheus URL and collected ` workload_health ` metrics
25+ - Updated every 30 seconds by the metric collector with latest health data
2526
26- 2 . ** MetricCollectorReport** (namespaced)
27- - Created by metric-collector on member clusters, reported back to hub
28- - Lives in ` fleet-member-<cluster-name> ` namespaces on the hub
29- - Contains collected ` workload_health ` metrics for all workloads in a cluster
30- - Updated every 30 seconds by the metric collector
31-
32- 3 . ** ClusterStagedWorkloadTracker** (cluster-scoped)
27+ 2 . ** ClusterStagedWorkloadTracker** (cluster-scoped)
3328 - Defines which workloads to monitor for a ClusterStagedUpdateRun
3429 - The name must match the ClusterStagedUpdateRun name
3530 - Specifies workload's name, namespace and expected health status
3631 - Used by approval-request-controller to determine if stage is ready for approval
3732
38- 4 . ** StagedWorkloadTracker** (namespaced)
33+ 3 . ** StagedWorkloadTracker** (namespaced)
3934 - Defines which workloads to monitor for a StagedUpdateRun
4035 - The name and namespace must match the StagedUpdateRun name and namespace
4136 - Specifies namespace, workload name, and expected health status
@@ -48,22 +43,21 @@ This solution introduces three new CRDs that work together with KubeFleet's nati
4843 - KubeFleet creates an ApprovalRequest (` ClusterApprovalRequest ` or ` ApprovalRequest ` ) for the first stage
4944 - The ApprovalRequest enters "Pending" state, waiting for approval
5045
51- 2 . ** Metric Collector Deployment**
52- - Approval-request-controller watches the ` ClusterApprovalRequest ` , ` ApprovalRequest ` objects
53- - Creates a ` MetricCollector ` resource on the hub (cluster-scoped)
54- - Creates a ` ClusterResourceOverride ` with per-cluster customization rules
55- - Each cluster gets a unique ` reportNamespace ` : ` fleet-member-<cluster-name> `
56- - Creates a ` ClusterResourcePlacement ` (CRP) with ` PickFixed ` policy
57- - Targets all clusters in the current stage
58- - KubeFleet propagates the customized ` MetricCollector ` to each member cluster
46+ 2 . ** Metric Collector Report Creation**
47+ - Approval-request-controller watches the ` ClusterApprovalRequest ` and ` ApprovalRequest ` objects
48+ - For each cluster in the current stage:
49+ - Creates a ` MetricCollectorReport ` in ` fleet-member-<cluster-name> ` namespace on hub
50+ - Sets ` spec.prometheusUrl ` to the Prometheus endpoint
51+ - Each report is specific to one cluster
5952
60533 . ** Metric Collection on Member Clusters**
6154 - Metric-collector controller runs on each member cluster
55+ - Watches for ` MetricCollectorReport ` in its ` fleet-member-<cluster-name> ` namespace on hub
6256 - Every 30 seconds, it:
63- - Queries local Prometheus with PromQL: ` workload_health `
57+ - Queries local Prometheus using URL from report spec with PromQL: ` workload_health `
6458 - Prometheus returns metrics for all pods with ` prometheus.io/scrape: "true" ` annotation
6559 - Extracts workload health (1.0 = healthy, 0.0 = unhealthy)
66- - Creates/updates ` MetricCollectorReport ` on hub in ` fleet-member-<cluster-name> ` namespace
60+ - Updates the ` MetricCollectorReport ` status on hub with collected metrics
6761
68624 . ** Health Evaluation**
6963 - Approval-request-controller monitors ` MetricCollectorReports ` from all stage clusters
@@ -72,7 +66,7 @@ This solution introduces three new CRDs that work together with KubeFleet's nati
7266 - For cluster-scoped: ` ClusterStagedWorkloadTracker ` with same name as ClusterStagedUpdateRun
7367 - For namespace-scoped: ` StagedWorkloadTracker ` with same name and namespace as StagedUpdateRun
7468 - For each cluster in the stage:
75- - Reads its ` MetricCollectorReport ` from ` fleet-member-<cluster-name> ` namespace
69+ - Reads its ` MetricCollectorReport ` status from ` fleet-member-<cluster-name> ` namespace
7670 - Verifies all tracked workloads are present and healthy
7771 - If any workload is missing or unhealthy, waits for next cycle
7872 - If ALL workloads across ALL clusters are healthy:
@@ -221,8 +215,8 @@ Before diving into the setup steps, here's a bird's eye view of what you'll be b
221215
2222163 . ** Approval Request Controller**
223217 - Watches ` ClusterApprovalRequest ` and ` ApprovalRequest ` objects
224- - Deploys MetricCollector to stage clusters via ClusterResourcePlacement
225- - Evaluates workload health from MetricCollectorReports
218+ - Creates MetricCollectorReport directly in ` fleet-member-<cluster-name> ` namespaces
219+ - Evaluates workload health from MetricCollectorReport status
226220 - Auto-approves stages when all workloads are healthy
227221
2282224 . ** Sample Metric App** (will be rolled out to clusters)
@@ -232,9 +226,9 @@ Before diving into the setup steps, here's a bird's eye view of what you'll be b
232226
233227** Member Clusters** - Where workloads run:
2342281 . ** Metric Collector**
235- - Queries local Prometheus every 30 seconds
236- - Reports workload health back to hub cluster
237- - Creates/updates MetricCollectorReport in hub's ` fleet-member-<cluster-name> ` namespace
229+ - Connects to hub cluster to watch MetricCollectorReport in its namespace
230+ - Queries local Prometheus every 30 seconds using URL from MetricCollectorReport spec
231+ - Updates MetricCollectorReport status on hub with collected health metrics
238232
2392332 . ** Prometheus** (received from hub)
240234 - Runs on each member cluster
@@ -284,15 +278,15 @@ When you create a **ClusterStagedUpdateRun** or **StagedUpdateRun**, here's what
284278
2852791. **Stage 1 (staging)** : Rollout starts with `kind-cluster-1`
286280 - KubeFleet creates an ApprovalRequest for the staging stage
287- - Approval controller deploys MetricCollector to ` kind-cluster-1`
288- - Metric collector reports health metrics back to hub
281+ - Approval controller creates MetricCollectorReport in `fleet-member- kind-cluster-1` namespace
282+ - Metric collector on `kind-cluster-1` watches its report on hub and updates status with health metrics
289283 - When `sample-metric-app` is healthy, approval controller auto-approves
290284 - KubeFleet proceeds with the rollout to `kind-cluster-1`
291285
2922862. **Stage 2 (prod)** : After staging succeeds
293287 - KubeFleet creates an ApprovalRequest for the prod stage
294- - Approval controller deploys MetricCollector to ` kind-cluster-2` and `kind-cluster-3`
295- - Metric collectors report health from both clusters
288+ - Approval controller creates MetricCollectorReports in `fleet-member- kind-cluster-2` and `fleet-member- kind-cluster-3`
289+ - Metric collectors on both clusters watch their reports and update with health data
296290 - When ALL workloads across BOTH prod clusters are healthy, auto-approve
297291 - KubeFleet completes the rollout to production clusters
298292
@@ -305,24 +299,23 @@ When you create a **ClusterStagedUpdateRun** or **StagedUpdateRun**, here's what
305299| **StagedUpdateStrategy** | Define stages with label selectors and approval requirements | Hub |
306300| **WorkloadTracker** | Specify which workloads to monitor for health | Hub |
307301| **UpdateRun** | Start the staged rollout process | Hub |
308- | **MetricCollector** | Automatically created by approval controller per stage | Hub → Member |
309- | **MetricCollectorReport** | Automatically created by metric collector | Member → Hub |
302+ | **MetricCollectorReport** | Created by approval controller, updated by metric collector | Hub (fleet-member-* ns) |
310303
311304# ## What the Installation Scripts Do
312305
313306**`install-on-hub.sh`** (Approval Request Controller):
314307- Takes ACR registry URL and hub cluster name as parameters
315308- Pulls approval-request-controller image from ACR
316309- Verifies KubeFleet CRDs are installed
317- - Installs controller via Helm with custom CRDs (MetricCollector, MetricCollectorReport, WorkloadTracker )
318- - Sets up RBAC for managing placements, overrides, and approval requests
310+ - Installs controller via Helm with custom CRDs (MetricCollectorReport, WorkloadTrackers )
311+ - Sets up RBAC for managing MetricCollectorReports and reading approval requests
319312
320313**`install-on-member.sh`** (Metric Collector):
321314- Takes ACR registry URL, hub cluster, and member cluster names as parameters
322- - Pulls metric-collector and metric-app images from ACR
323- - Creates service account with hub cluster access token
315+ - Pulls metric-collector image from ACR
316+ - Creates service account with hub cluster access token and RBAC for watching/updating MetricCollectorReports
324317- Installs metric-collector via Helm on each member cluster
325- - Configures connection to hub API server and local Prometheus
318+ - Configures connection to hub API server to watch reports and local Prometheus for metrics
326319
327320With this understanding, you're ready to start the setup!
328321
@@ -692,13 +685,7 @@ kubectl logs -n default deployment/metric-collector -f
692685
693686# ## Check Metrics Collection
694687
695- Verify that MetricCollector resources exist on member clusters:
696- ` ` ` bash
697- kubectl config use-context kind-cluster-1
698- kubectl get metriccollector -A
699- ` ` `
700-
701- Verify that MetricCollectorReports are being created on the hub:
688+ Verify that MetricCollectorReports are being created and updated on the hub:
702689` ` ` bash
703690kubectl config use-context kind-hub
704691kubectl get metriccollectorreport -A
0 commit comments