Skip to content

Commit 018cacb

Browse files
author
Arvind Thirumurugan
committed
add birdeye view section
Signed-off-by: Arvind Thirumurugan <arvindth@microsoft.com>
1 parent 23ac827 commit 018cacb

File tree

1 file changed

+131
-1
lines changed
  • approval-controller-metric-collector

1 file changed

+131
-1
lines changed

approval-controller-metric-collector/README.md

Lines changed: 131 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,129 @@ This solution introduces three new CRDs that work together with KubeFleet's nati
9191
- Helm 3.x
9292
- KubeFleet installed on hub and member clusters
9393

94+
## Setup Overview
95+
96+
Before diving into the setup steps, here's a bird's eye view of what you'll be building:
97+
98+
### Architecture Components
99+
100+
**Hub Cluster** - The control plane where you'll deploy:
101+
1. **3 Member Clusters** (kind-cluster-1, kind-cluster-2, kind-cluster-3)
102+
- Labeled with `environment=staging` or `environment=prod`
103+
- These labels determine which stage each cluster belongs to during rollouts
104+
105+
2. **Prometheus** (propagated to all clusters)
106+
- Monitors workload health via `/metrics` endpoints
107+
- Scrapes pods with `prometheus.io/scrape: "true"` annotation
108+
- Provides `workload_health` metric (1.0 = healthy, 0.0 = unhealthy)
109+
110+
3. **Approval Request Controller**
111+
- Watches `ClusterApprovalRequest` and `ApprovalRequest` objects
112+
- Deploys MetricCollector to stage clusters via ClusterResourcePlacement
113+
- Evaluates workload health from MetricCollectorReports
114+
- Auto-approves stages when all workloads are healthy
115+
116+
4. **Sample Metric App** (will be rolled out to clusters)
117+
- Simple Go application exposing `/metrics` endpoint
118+
- Reports `workload_health=1.0` by default
119+
- Used to demonstrate health-based approvals
120+
121+
**Member Clusters** - Where workloads run:
122+
1. **Metric Collector**
123+
- Queries local Prometheus every 30 seconds
124+
- Reports workload health back to hub cluster
125+
- Creates/updates MetricCollectorReport in hub's `fleet-member-<cluster-name>` namespace
126+
127+
2. **Prometheus** (received from hub)
128+
- Runs on each member cluster
129+
- Scrapes local workload metrics
130+
131+
3. **Sample Metric App** (received from hub)
132+
- Deployed via staged rollout
133+
- Monitored for health during updates
134+
135+
### WorkloadTracker - The Decision Maker
136+
137+
The **WorkloadTracker** is a critical resource that tells the approval controller which workloads must be healthy before approving a stage. Without it, the controller doesn't know what to monitor.
138+
139+
**Two Types:**
140+
141+
1. **ClusterStagedWorkloadTracker** (for ClusterStagedUpdateRun)
142+
- Cluster-scoped resource on the hub
143+
- Name must exactly match the ClusterStagedUpdateRun name
144+
- Example: If your UpdateRun is named `example-cluster-staged-run`, the tracker must also be named `example-cluster-staged-run`
145+
- Contains a list of workloads (name + namespace) to monitor across all clusters in each stage
146+
147+
2. **StagedWorkloadTracker** (for StagedUpdateRun)
148+
- Namespace-scoped resource on the hub
149+
- Name and namespace must exactly match the StagedUpdateRun
150+
- Example: If your UpdateRun is `example-staged-run` in namespace `test-ns`, the tracker must be `example-staged-run` in `test-ns`
151+
- Contains a list of workloads to monitor
152+
153+
**How It Works:**
154+
```yaml
155+
# ClusterStagedWorkloadTracker example
156+
workloads:
157+
- name: sample-metric-app # Deployment name
158+
namespace: test-ns # Namespace where it runs
159+
```
160+
161+
When the approval controller evaluates a stage:
162+
1. It fetches the WorkloadTracker that matches the UpdateRun name (and namespace)
163+
2. For each cluster in the stage, it reads the MetricCollectorReport
164+
3. It verifies that every workload listed in the tracker appears in the report with `health=1.0`
165+
4. Only when ALL workloads in ALL clusters are healthy does it approve the stage
166+
167+
**Critical Rule:** The WorkloadTracker must be created BEFORE starting the UpdateRun. If the controller can't find a matching tracker, it won't approve any stages.
168+
169+
### The Staged Rollout Flow
170+
171+
When you create a **ClusterStagedUpdateRun** or **StagedUpdateRun**, here's what happens:
172+
173+
1. **Stage 1 (staging)**: Rollout starts with `kind-cluster-1`
174+
- KubeFleet creates an ApprovalRequest for the staging stage
175+
- Approval controller deploys MetricCollector to `kind-cluster-1`
176+
- Metric collector reports health metrics back to hub
177+
- When `sample-metric-app` is healthy, approval controller auto-approves
178+
- KubeFleet proceeds with the rollout to `kind-cluster-1`
179+
180+
2. **Stage 2 (prod)**: After staging succeeds
181+
- KubeFleet creates an ApprovalRequest for the prod stage
182+
- Approval controller deploys MetricCollector to `kind-cluster-2` and `kind-cluster-3`
183+
- Metric collectors report health from both clusters
184+
- When ALL workloads across BOTH prod clusters are healthy, auto-approve
185+
- KubeFleet completes the rollout to production clusters
186+
187+
### Key Resources You'll Create
188+
189+
| Resource | Purpose | Where |
190+
|----------|---------|-------|
191+
| **MemberCluster** | Register member clusters with hub, apply stage labels | Hub |
192+
| **ClusterResourcePlacement** | Define what resources to propagate (Prometheus, sample-app) | Hub |
193+
| **StagedUpdateStrategy** | Define stages with label selectors and approval requirements | Hub |
194+
| **WorkloadTracker** | Specify which workloads to monitor for health | Hub |
195+
| **UpdateRun** | Start the staged rollout process | Hub |
196+
| **MetricCollector** | Automatically created by approval controller per stage | Hub → Member |
197+
| **MetricCollectorReport** | Automatically created by metric collector | Member → Hub |
198+
199+
### What the Installation Scripts Do
200+
201+
**`install-on-hub.sh`** (Approval Request Controller):
202+
- Builds controller Docker image with multi-arch support
203+
- Loads image into kind hub cluster
204+
- Verifies KubeFleet CRDs are installed
205+
- Installs controller via Helm with custom CRDs (MetricCollector, MetricCollectorReport, WorkloadTracker)
206+
- Sets up RBAC for managing placements, overrides, and approval requests
207+
208+
**`install-on-member.sh`** (Metric Collector):
209+
- Builds metric-collector and metric-app Docker images
210+
- Loads both images into each kind member cluster
211+
- Creates service account with hub cluster access token
212+
- Installs metric-collector via Helm on each member cluster
213+
- Configures connection to hub API server and local Prometheus
214+
215+
With this understanding, you're ready to start the setup!
216+
94217
## Setup
95218

96219
### 1. Setup KubeFleet Clusters
@@ -128,7 +251,14 @@ kubectl config use-context kind-hub
128251
129252
# Register member clusters with the hub
130253
# This creates MemberCluster resources for kind-cluster-1, kind-cluster-2, and kind-cluster-3
131-
# Each MemberCluster resource contains the API endpoint and credentials for the member cluster
254+
# Each MemberCluster resource contains:
255+
# - API endpoint and credentials for the member cluster
256+
# - Labels for organizing clusters into stages:
257+
# * kind-cluster-1: environment=staging (Stage 1)
258+
# * kind-cluster-2: environment=prod (Stage 2)
259+
# * kind-cluster-3: environment=prod (Stage 2)
260+
# These labels are used by the StagedUpdateStrategy's labelSelector to determine
261+
# which clusters are part of each stage during the UpdateRun
132262
kubectl apply -f ./examples/membercluster/
133263
134264
# Verify clusters are registered

0 commit comments

Comments
 (0)