Skip to content

Commit 76de5f3

Browse files
author
Arvind Thirumurugan
committed
update README.md
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
1 parent 69a8ed4 commit 76de5f3

File tree

1 file changed

+34
-47
lines changed
  • approval-controller-metric-collector

1 file changed

+34
-47
lines changed

approval-controller-metric-collector/README.md

Lines changed: 34 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This tutorial demonstrates how to use the Approval Request Controller and Metric
66

77
This directory contains two controllers:
88
- **approval-request-controller**: Runs on the hub cluster to automate approval decisions for staged updates
9-
- **metric-collector**: Runs on member clusters to collect workload health metrics from Prometheus
9+
- **metric-collector**: Runs on member clusters to collect and report workload health metrics
1010

1111
![Approval Controller and Metric Collector Architecture](./images/approval-controller-metric-collector.png)
1212

@@ -18,24 +18,19 @@ This solution introduces three new CRDs that work together with KubeFleet's nati
1818

1919
#### Hub Cluster CRDs
2020

21-
1. **MetricCollector** (cluster-scoped)
22-
- Defines Prometheus connection details and where to report metrics
23-
- Gets propagated to member clusters via ClusterResourcePlacement (CRP)
24-
- Each member cluster receives a customized version with its specific `reportNamespace`
21+
1. **MetricCollectorReport** (namespaced)
22+
- Created by approval-request-controller in `fleet-member-<cluster-name>` namespaces on hub
23+
- Watched and updated by metric-collector running on member clusters
24+
- Contains specification of Prometheus URL and collected `workload_health` metrics
25+
- Updated every 30 seconds by the metric collector with latest health data
2526

26-
2. **MetricCollectorReport** (namespaced)
27-
- Created by metric-collector on member clusters, reported back to hub
28-
- Lives in `fleet-member-<cluster-name>` namespaces on the hub
29-
- Contains collected `workload_health` metrics for all workloads in a cluster
30-
- Updated every 30 seconds by the metric collector
31-
32-
3. **ClusterStagedWorkloadTracker** (cluster-scoped)
27+
2. **ClusterStagedWorkloadTracker** (cluster-scoped)
3328
- Defines which workloads to monitor for a ClusterStagedUpdateRun
3429
- The name must match the ClusterStagedUpdateRun name
3530
- Specifies workload's name, namespace and expected health status
3631
- Used by approval-request-controller to determine if stage is ready for approval
3732

38-
4. **StagedWorkloadTracker** (namespaced)
33+
3. **StagedWorkloadTracker** (namespaced)
3934
- Defines which workloads to monitor for a StagedUpdateRun
4035
- The name and namespace must match the StagedUpdateRun name and namespace
4136
- Specifies namespace, workload name, and expected health status
@@ -48,22 +43,21 @@ This solution introduces three new CRDs that work together with KubeFleet's nati
4843
- KubeFleet creates an ApprovalRequest (`ClusterApprovalRequest` or `ApprovalRequest`) for the first stage
4944
- The ApprovalRequest enters "Pending" state, waiting for approval
5045

51-
2. **Metric Collector Deployment**
52-
- Approval-request-controller watches the `ClusterApprovalRequest`, `ApprovalRequest` objects
53-
- Creates a `MetricCollector` resource on the hub (cluster-scoped)
54-
- Creates a `ClusterResourceOverride` with per-cluster customization rules
55-
- Each cluster gets a unique `reportNamespace`: `fleet-member-<cluster-name>`
56-
- Creates a `ClusterResourcePlacement` (CRP) with `PickFixed` policy
57-
- Targets all clusters in the current stage
58-
- KubeFleet propagates the customized `MetricCollector` to each member cluster
46+
2. **Metric Collector Report Creation**
47+
- Approval-request-controller watches the `ClusterApprovalRequest` and `ApprovalRequest` objects
48+
- For each cluster in the current stage:
49+
- Creates a `MetricCollectorReport` in `fleet-member-<cluster-name>` namespace on hub
50+
- Sets `spec.prometheusUrl` to the Prometheus endpoint
51+
- Each report is specific to one cluster
5952

6053
3. **Metric Collection on Member Clusters**
6154
- Metric-collector controller runs on each member cluster
55+
- Watches for `MetricCollectorReport` in its `fleet-member-<cluster-name>` namespace on hub
6256
- Every 30 seconds, it:
63-
- Queries local Prometheus with PromQL: `workload_health`
57+
- Queries local Prometheus using URL from report spec with PromQL: `workload_health`
6458
- Prometheus returns metrics for all pods with `prometheus.io/scrape: "true"` annotation
6559
- Extracts workload health (1.0 = healthy, 0.0 = unhealthy)
66-
- Creates/updates `MetricCollectorReport` on hub in `fleet-member-<cluster-name>` namespace
60+
- Updates the `MetricCollectorReport` status on hub with collected metrics
6761

6862
4. **Health Evaluation**
6963
- Approval-request-controller monitors `MetricCollectorReports` from all stage clusters
@@ -72,7 +66,7 @@ This solution introduces three new CRDs that work together with KubeFleet's nati
7266
- For cluster-scoped: `ClusterStagedWorkloadTracker` with same name as ClusterStagedUpdateRun
7367
- For namespace-scoped: `StagedWorkloadTracker` with same name and namespace as StagedUpdateRun
7468
- For each cluster in the stage:
75-
- Reads its `MetricCollectorReport` from `fleet-member-<cluster-name>` namespace
69+
- Reads its `MetricCollectorReport` status from `fleet-member-<cluster-name>` namespace
7670
- Verifies all tracked workloads are present and healthy
7771
- If any workload is missing or unhealthy, waits for next cycle
7872
- If ALL workloads across ALL clusters are healthy:
@@ -221,8 +215,8 @@ Before diving into the setup steps, here's a bird's eye view of what you'll be b
221215

222216
3. **Approval Request Controller**
223217
- Watches `ClusterApprovalRequest` and `ApprovalRequest` objects
224-
- Deploys MetricCollector to stage clusters via ClusterResourcePlacement
225-
- Evaluates workload health from MetricCollectorReports
218+
- Creates MetricCollectorReport directly in `fleet-member-<cluster-name>` namespaces
219+
- Evaluates workload health from MetricCollectorReport status
226220
- Auto-approves stages when all workloads are healthy
227221

228222
4. **Sample Metric App** (will be rolled out to clusters)
@@ -232,9 +226,9 @@ Before diving into the setup steps, here's a bird's eye view of what you'll be b
232226

233227
**Member Clusters** - Where workloads run:
234228
1. **Metric Collector**
235-
- Queries local Prometheus every 30 seconds
236-
- Reports workload health back to hub cluster
237-
- Creates/updates MetricCollectorReport in hub's `fleet-member-<cluster-name>` namespace
229+
- Connects to hub cluster to watch MetricCollectorReport in its namespace
230+
- Queries local Prometheus every 30 seconds using URL from MetricCollectorReport spec
231+
- Updates MetricCollectorReport status on hub with collected health metrics
238232

239233
2. **Prometheus** (received from hub)
240234
- Runs on each member cluster
@@ -284,15 +278,15 @@ When you create a **ClusterStagedUpdateRun** or **StagedUpdateRun**, here's what
284278

285279
1. **Stage 1 (staging)**: Rollout starts with `kind-cluster-1`
286280
- KubeFleet creates an ApprovalRequest for the staging stage
287-
- Approval controller deploys MetricCollector to `kind-cluster-1`
288-
- Metric collector reports health metrics back to hub
281+
- Approval controller creates MetricCollectorReport in `fleet-member-kind-cluster-1` namespace
282+
- Metric collector on `kind-cluster-1` watches its report on hub and updates status with health metrics
289283
- When `sample-metric-app` is healthy, approval controller auto-approves
290284
- KubeFleet proceeds with the rollout to `kind-cluster-1`
291285

292286
2. **Stage 2 (prod)**: After staging succeeds
293287
- KubeFleet creates an ApprovalRequest for the prod stage
294-
- Approval controller deploys MetricCollector to `kind-cluster-2` and `kind-cluster-3`
295-
- Metric collectors report health from both clusters
288+
- Approval controller creates MetricCollectorReports in `fleet-member-kind-cluster-2` and `fleet-member-kind-cluster-3`
289+
- Metric collectors on both clusters watch their reports and update with health data
296290
- When ALL workloads across BOTH prod clusters are healthy, auto-approve
297291
- KubeFleet completes the rollout to production clusters
298292

@@ -305,24 +299,23 @@ When you create a **ClusterStagedUpdateRun** or **StagedUpdateRun**, here's what
305299
| **StagedUpdateStrategy** | Define stages with label selectors and approval requirements | Hub |
306300
| **WorkloadTracker** | Specify which workloads to monitor for health | Hub |
307301
| **UpdateRun** | Start the staged rollout process | Hub |
308-
| **MetricCollector** | Automatically created by approval controller per stage | Hub → Member |
309-
| **MetricCollectorReport** | Automatically created by metric collector | Member → Hub |
302+
| **MetricCollectorReport** | Created by approval controller, updated by metric collector | Hub (fleet-member-* ns) |
310303

311304
### What the Installation Scripts Do
312305

313306
**`install-on-hub.sh`** (Approval Request Controller):
314307
- Takes ACR registry URL and hub cluster name as parameters
315308
- Pulls approval-request-controller image from ACR
316309
- Verifies KubeFleet CRDs are installed
317-
- Installs controller via Helm with custom CRDs (MetricCollector, MetricCollectorReport, WorkloadTracker)
318-
- Sets up RBAC for managing placements, overrides, and approval requests
310+
- Installs controller via Helm with custom CRDs (MetricCollectorReport, WorkloadTrackers)
311+
- Sets up RBAC for managing MetricCollectorReports and reading approval requests
319312

320313
**`install-on-member.sh`** (Metric Collector):
321314
- Takes ACR registry URL, hub cluster, and member cluster names as parameters
322-
- Pulls metric-collector and metric-app images from ACR
323-
- Creates service account with hub cluster access token
315+
- Pulls metric-collector image from ACR
316+
- Creates service account with hub cluster access token and RBAC for watching/updating MetricCollectorReports
324317
- Installs metric-collector via Helm on each member cluster
325-
- Configures connection to hub API server and local Prometheus
318+
- Configures connection to hub API server to watch reports and local Prometheus for metrics
326319

327320
With this understanding, you're ready to start the setup!
328321

@@ -692,13 +685,7 @@ kubectl logs -n default deployment/metric-collector -f
692685

693686
### Check Metrics Collection
694687

695-
Verify that MetricCollector resources exist on member clusters:
696-
```bash
697-
kubectl config use-context kind-cluster-1
698-
kubectl get metriccollector -A
699-
```
700-
701-
Verify that MetricCollectorReports are being created on the hub:
688+
Verify that MetricCollectorReports are being created and updated on the hub:
702689
```bash
703690
kubectl config use-context kind-hub
704691
kubectl get metriccollectorreport -A

0 commit comments

Comments
 (0)