You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Watches `ClusterApprovalRequest` and `ApprovalRequest` objects
112
+
- Deploys MetricCollector to stage clusters via ClusterResourcePlacement
113
+
- Evaluates workload health from MetricCollectorReports
114
+
- Auto-approves stages when all workloads are healthy
115
+
116
+
4.**Sample Metric App** (will be rolled out to clusters)
117
+
- Simple Go application exposing `/metrics` endpoint
118
+
- Reports `workload_health=1.0` by default
119
+
- Used to demonstrate health-based approvals
120
+
121
+
**Member Clusters** - Where workloads run:
122
+
1.**Metric Collector**
123
+
- Queries local Prometheus every 30 seconds
124
+
- Reports workload health back to hub cluster
125
+
- Creates/updates MetricCollectorReport in hub's `fleet-member-<cluster-name>` namespace
126
+
127
+
2.**Prometheus** (received from hub)
128
+
- Runs on each member cluster
129
+
- Scrapes local workload metrics
130
+
131
+
3.**Sample Metric App** (received from hub)
132
+
- Deployed via staged rollout
133
+
- Monitored for health during updates
134
+
135
+
### WorkloadTracker - The Decision Maker
136
+
137
+
The **WorkloadTracker** is a critical resource that tells the approval controller which workloads must be healthy before approving a stage. Without it, the controller doesn't know what to monitor.
- Name must exactly match the ClusterStagedUpdateRun name
144
+
- Example: If your UpdateRun is named `example-cluster-staged-run`, the tracker must also be named `example-cluster-staged-run`
145
+
- Contains a list of workloads (name + namespace) to monitor across all clusters in each stage
146
+
147
+
2.**StagedWorkloadTracker** (for StagedUpdateRun)
148
+
- Namespace-scoped resource on the hub
149
+
- Name and namespace must exactly match the StagedUpdateRun
150
+
- Example: If your UpdateRun is `example-staged-run` in namespace `test-ns`, the tracker must be `example-staged-run` in `test-ns`
151
+
- Contains a list of workloads to monitor
152
+
153
+
**How It Works:**
154
+
```yaml
155
+
# ClusterStagedWorkloadTracker example
156
+
workloads:
157
+
- name: sample-metric-app # Deployment name
158
+
namespace: test-ns # Namespace where it runs
159
+
```
160
+
161
+
When the approval controller evaluates a stage:
162
+
1. It fetches the WorkloadTracker that matches the UpdateRun name (and namespace)
163
+
2. For each cluster in the stage, it reads the MetricCollectorReport
164
+
3. It verifies that every workload listed in the tracker appears in the report with `health=1.0`
165
+
4. Only when ALL workloads in ALL clusters are healthy does it approve the stage
166
+
167
+
**Critical Rule:** The WorkloadTracker must be created BEFORE starting the UpdateRun. If the controller can't find a matching tracker, it won't approve any stages.
168
+
169
+
### The Staged Rollout Flow
170
+
171
+
When you create a **ClusterStagedUpdateRun** or **StagedUpdateRun**, here's what happens:
172
+
173
+
1. **Stage 1 (staging)**: Rollout starts with `kind-cluster-1`
174
+
- KubeFleet creates an ApprovalRequest for the staging stage
175
+
- Approval controller deploys MetricCollector to `kind-cluster-1`
176
+
- Metric collector reports health metrics back to hub
177
+
- When `sample-metric-app` is healthy, approval controller auto-approves
178
+
- KubeFleet proceeds with the rollout to `kind-cluster-1`
179
+
180
+
2. **Stage 2 (prod)**: After staging succeeds
181
+
- KubeFleet creates an ApprovalRequest for the prod stage
182
+
- Approval controller deploys MetricCollector to `kind-cluster-2` and `kind-cluster-3`
183
+
- Metric collectors report health from both clusters
184
+
- When ALL workloads across BOTH prod clusters are healthy, auto-approve
185
+
- KubeFleet completes the rollout to production clusters
186
+
187
+
### Key Resources You'll Create
188
+
189
+
| Resource | Purpose | Where |
190
+
|----------|---------|-------|
191
+
| **MemberCluster** | Register member clusters with hub, apply stage labels | Hub |
192
+
| **ClusterResourcePlacement** | Define what resources to propagate (Prometheus, sample-app) | Hub |
193
+
| **StagedUpdateStrategy** | Define stages with label selectors and approval requirements | Hub |
194
+
| **WorkloadTracker** | Specify which workloads to monitor for health | Hub |
195
+
| **UpdateRun** | Start the staged rollout process | Hub |
196
+
| **MetricCollector** | Automatically created by approval controller per stage | Hub → Member |
197
+
| **MetricCollectorReport** | Automatically created by metric collector | Member → Hub |
0 commit comments