ClusterResourcePlacementScheduled condition gets reset in CRP spec updates

## Summary

We just had an incident due to CRP object's `ClusterResourcePlacementScheduled` flipping to `Unknown` for probably not a good reason, and the CRP controller ending up requeueing the objects, which caused workqueue to be clogged.

We were propagating ~10,000 namespaces with the Hub agent, and it took the controller with 40 workers **3 hours** to finish reconciling these objects.

## Background

#### Step 1 
We decided to change `CRP.spec.strategy.rollingUpdate.maxUnavailable` field of ~10,000 CRPs to a different value (we manage the CRP objects with another in-house controller)

#### Step 2

This change caused `ClusterResourcePlacementScheduled` condition to change:

```diff
     conditions:
     - type: ClusterResourcePlacementScheduled
-      lastTransitionTime: "2025-10-08T17:24:04Z"
-      message: found all the clusters needed as specified by the scheduling policy
-      observedGeneration: 6
-      reason: SchedulingPolicyFulfilled
-      status: "True"
+      lastTransitionTime: "2025-10-08T17:55:42Z"
+      message: Scheduling has not completed
+      observedGeneration: 7
+      reason: SchedulePending
+      status: Unknown
```

This happens at https://github.com/kubefleet-dev/kubefleet/blob/39ed26bc5499fe78f9e4b53f15f9cf805c3815ff/pkg/controllers/placement/controller.go#L1194-L1202

Mind you we only have very basic placement rule, so we don't even know why these are getting set to Unknown:

```
spec:
  policy:
    clusterNames:
    - lit-lca1-1-k8s-1
    placementType: PickFixed
```

#### Step 3

Hub agent's controller sees the changed items and adds them to the queue, workqueue starts building up:

> <img width="1000" alt="Image" src="https://github.com/user-attachments/assets/5d9036a5-6ae9-456a-a39d-dc6c6db02dbe" />

#### Step 4


Hub agent's `cluster-resource-placement-controller-v1beta1`'s logs are _flooded_ with the following message saying scheduling condition is `Unknown`:

```
clusterresourceplacement/controller.go:280] "Scheduler has not scheduled any cluster yet and requeue the request as a backup"
        clusterResourcePlacement="proxima-grpc-pod-26526"     scheduledCondition="&Condition{Type:ClusterResourcePlacementScheduled,Status:Unknown,ObservedGeneration:4,LastTransitionTime:2025-10-07 23:16:32 +0000 UTC,Reason:SchedulePending,Message:Scheduling has not completed,}"
        generation=4
...
```

and proceeds to requeue everything:
 
> requeue outcomes:
> <img width=1000  alt="Image" src="https://github.com/user-attachments/assets/3bcb7ff5-8390-45e2-85fe-deaf0cbf177c" />

## tl;dr

Setting ClusterResourcePlacementScheduled condition on `CRP.spec` changes to Unknown essentially makes Hub controller unable to process 10,000 objects, even with 40 workers (because it uses 1/10 of them, #273).

cc: @mikehelmick @ArchanaAnand0212 

	latestSchedulingPolicySnapshot.GetPolicySnapshotStatus().ObservedCRPGeneration < placementObj.GetGeneration() \|\|
	scheduledCondition.Status == metav1.ConditionUnknown {
	return metav1.Condition{
	Status: metav1.ConditionUnknown,
	Type: getPlacementScheduledConditionType(placementObj),
	Reason: condition.SchedulingUnknownReason,
	Message: "Scheduling has not completed",
	ObservedGeneration: placementObj.GetGeneration(),
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ClusterResourcePlacementScheduled condition gets reset in CRP spec updates #275

Summary

Background

Step 1

Step 2

Step 3

Step 4

tl;dr

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ClusterResourcePlacementScheduled condition gets reset in CRP spec updates #275

Description

Summary

Background

Step 1

Step 2

Step 3

Step 4

tl;dr

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions