Skip to content

[bug] update too many pods at the same time. #310

@runkecheng

Description

@runkecheng

Describe the problem

There are two nodes(3 nodes cluster) to be deleted and restarted when updating configuration. its leads to a cluster to be temporarily unavailable, The correct situation should be only one node update at the same time.

To Reproduce

The default PodDisruptionBudget is 50%, so the minimum available node of the 3-node cluster is 2, but in fact, when the configuration is updated, the two nodes will update at once, which does not match the PDB.

root cause:

The StatefulSetUpdateStrategy is OnDelete, delete POD by logic as follows,

	if pod.ObjectMeta.Labels["controller-revision-hash"] == s.sfs.Status.UpdateRevision {
		log.Info("pod is already updated", "pod name", pod.Name)
	} else {
                ...
		if pod.DeletionTimestamp != nil {
			log.Info("pod is being deleted", "pod", pod.Name, "key", s.Unwrap())
		} else {
			if err := s.cli.Delete(ctx, pod); err != nil {
				return err
			}
		}
	}

after delete a pod, retry will exit in advance because the health tag of the node being deleted is still yes. The correct logic is waiting for the deleted POD re-readiness and update the next POD.

	if pod.ObjectMeta.Labels["healthy"] == "yes" &&
		pod.ObjectMeta.Labels["controller-revision-hash"] != s.sfs.Status.UpdateRevision {
		return false, fmt.Errorf("pod %s is ready, wait next schedule", pod.Name)
	}

Expected behavior

Environment:

  • RadonDB MySQL version:

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions