Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
cb55d11
feat(498): Add ownerReferences to managed entities
mbegenau Dec 4, 2022
7bd12ec
Merge branch 'master' into feat-498-ownerReferences
FxKu Jul 1, 2024
99b56cf
Update pkg/cluster/k8sres.go
FxKu Jul 1, 2024
6dc6cfe
Merge branch 'master' into mbegenau-feat-498-ownerReferences
FxKu Jul 8, 2024
a2c332e
Merge branch 'mbegenau-feat-498-ownerReferences' of github.com:zaland…
FxKu Jul 8, 2024
52e42c4
fix unit test for pdb
FxKu Jul 8, 2024
b7b6174
empty owner reference for cross namespace secret and more tests
FxKu Jul 8, 2024
731a4cd
update ownerReferences of existing resources
FxKu Jul 9, 2024
aa8473e
include diff also for PDB
FxKu Jul 10, 2024
47fc255
CR ownerReference on PVC blocks pvc retention policy of statefulset
FxKu Jul 12, 2024
3f3b8fe
make ownerreferences optional and disabled by default
FxKu Jul 15, 2024
279c775
update unit test to check len ownerReferences
FxKu Jul 16, 2024
eeacdd9
update codegen
FxKu Jul 16, 2024
1e19a6b
add owner references e2e test
FxKu Jul 17, 2024
6809cb5
update unit test and add debug print
FxKu Jul 17, 2024
fc1fb2c
add block_owner_deletion field to test owner reference
FxKu Jul 17, 2024
f832a4b
fix test owner reference
FxKu Jul 17, 2024
77aa634
why are owner refs different
FxKu Jul 17, 2024
4f1abb8
stop comparing dicts
FxKu Jul 17, 2024
27699e8
fix unit test
FxKu Jul 17, 2024
324677e
fix assertion in unit test
FxKu Jul 18, 2024
b9424e3
improve e2e test
FxKu Jul 18, 2024
3c794f3
minor fix in e2e test
FxKu Jul 18, 2024
787f37d
another minor fix in e2e test
FxKu Jul 18, 2024
fb197fa
removing ownerReference requires Update API call
FxKu Jul 18, 2024
8d175d9
debug e2e tests
FxKu Jul 18, 2024
b56cffd
owner reference toggle only via update call
FxKu Jul 19, 2024
3240c01
e2e now working - change names to re-order
FxKu Jul 19, 2024
2999dff
fix around delete protection
FxKu Jul 19, 2024
3abfba9
fix typos and update docs once more
FxKu Jul 22, 2024
b188c6d
reflect code feedback
FxKu Aug 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions charts/postgres-operator/crds/operatorconfigurations.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -211,9 +211,9 @@ spec:
enable_init_containers:
type: boolean
default: true
enable_secrets_deletion:
enable_owner_references:
type: boolean
default: true
default: false
enable_persistent_volume_claim_deletion:
type: boolean
default: true
Expand All @@ -226,6 +226,9 @@ spec:
enable_readiness_probe:
type: boolean
default: false
enable_secrets_deletion:
type: boolean
default: true
enable_sidecars:
type: boolean
default: true
Expand Down
2 changes: 2 additions & 0 deletions charts/postgres-operator/templates/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ rules:
- create
- delete
- get
- patch
- update
# to check nodes for node readiness label
- apiGroups:
Expand Down Expand Up @@ -196,6 +197,7 @@ rules:
- get
- list
- patch
- update
# to CRUD cron jobs for logical backups
- apiGroups:
- batch
Expand Down
6 changes: 4 additions & 2 deletions charts/postgres-operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -129,8 +129,8 @@ configKubernetes:
enable_finalizers: false
# enables initContainers to run actions before Spilo is started
enable_init_containers: true
# toggles if operator should delete secrets on cluster deletion
enable_secrets_deletion: true
# toggles if child resources should have an owner reference to the postgresql CR
enable_owner_references: false
# toggles if operator should delete PVCs on cluster deletion
enable_persistent_volume_claim_deletion: true
# toggles pod anti affinity on the Postgres pods
Expand All @@ -139,6 +139,8 @@ configKubernetes:
enable_pod_disruption_budget: true
# toogles readiness probe for database pods
enable_readiness_probe: false
# toggles if operator should delete secrets on cluster deletion
enable_secrets_deletion: true
# enables sidecar containers to run alongside Spilo in the same pod
enable_sidecars: true

Expand Down
70 changes: 62 additions & 8 deletions docs/administrator.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,9 +223,9 @@ configuration:

Now, every cluster manifest must contain the configured annotation keys to
trigger the delete process when running `kubectl delete pg`. Note, that the
`Postgresql` resource would still get deleted as K8s' API server does not
block it. Only the operator logs will tell, that the delete criteria wasn't
met.
`Postgresql` resource would still get deleted because the operator does not
instruct K8s' API server to block it. Only the operator logs will tell, that
the delete criteria was not met.

**cluster manifest**

Expand All @@ -243,11 +243,65 @@ spec:

In case, the resource has been deleted accidentally or the annotations were
simply forgotten, it's safe to recreate the cluster with `kubectl create`.
Existing Postgres cluster are not replaced by the operator. But, as the
original cluster still exists the status will show `CreateFailed` at first.
On the next sync event it should change to `Running`. However, as it is in
fact a new resource for K8s, the UID will differ which can trigger a rolling
update of the pods because the UID is used as part of backup path to S3.
Existing Postgres cluster are not replaced by the operator. But, when the
original cluster still exists the status will be `CreateFailed` at first. On
the next sync event it should change to `Running`. However, because it is in
fact a new resource for K8s, the UID and therefore, the backup path to S3,
will differ and trigger a rolling update of the pods.

## Owner References and Finalizers

The Postgres Operator can set [owner references](https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/) to most of a cluster's child resources to improve
monitoring with GitOps tools and enable cascading deletes. There are three
exceptions:

* Persistent Volume Claims, because they are handled by the [PV Reclaim Policy]https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy/ of the Stateful Set
* The config endpoint + headless service resource because it is managed by Patroni
* Cross-namespace secrets, because owner references are not allowed across namespaces by design

The operator would clean these resources up with its regular delete loop
unless they got synced correctly. If for some reason the initial cluster sync
fails, e.g. after a cluster creation or operator restart, a deletion of the
cluster manifest would leave orphaned resources behind which the user has to
clean up manually.

Another option is to enable finalizers which first ensures the deletion of all
child resources before the cluster manifest gets removed. There is a trade-off
though: The deletion is only performed after the next two operator SYNC cycles
with the first one setting a `deletionTimestamp` and the latter reacting to it.
The final removal of the custom resource will add a DELETE event to the worker
queue but the child resources are already gone at this point. If you do not
desire this behavior consider enabling owner references instead.

**postgres-operator ConfigMap**

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-operator
data:
enable_finalizers: "false"
enable_owner_references: "true"
```

**OperatorConfiguration**

```yaml
apiVersion: "acid.zalan.do/v1"
kind: OperatorConfiguration
metadata:
name: postgresql-operator-configuration
configuration:
kubernetes:
enable_finalizers: false
enable_owner_references: true
```

:warning: Please note, both options are disabled by default. When enabling owner
references the operator cannot block cascading deletes, even when the [delete protection annotations](administrator.md#delete-protection-via-annotations)
are in place. You would need an K8s admission controller that blocks the actual
`kubectl delete` API call e.g. based on existing annotations.

## Role-based access control for the operator

Expand Down
49 changes: 25 additions & 24 deletions docs/reference/operator_parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,31 @@ Parameters to configure cluster-related Kubernetes objects created by the
operator, as well as some timeouts associated with them. In a CRD-based
configuration they are grouped under the `kubernetes` key.

* **enable_finalizers**
By default, a deletion of the Postgresql resource will trigger an event
that leads to a cleanup of all child resources. However, if the database
cluster is in a broken state (e.g. failed initialization) and the operator
cannot fully sync it, there can be leftovers. By enabling finalizers the
operator will ensure all managed resources are deleted prior to the
Postgresql resource. See also [admin docs](../administrator.md#owner-references-and-finalizers)
for more information The default is `false`.

* **enable_owner_references**
The operator can set owner references on its child resources (except PVCs,
Patroni config service/endpoint, cross-namespace secrets) to improve cluster
monitoring and enable cascading deletion. The default is `false`. Warning,
enabling this option disables configured delete protection checks (see below).

* **delete_annotation_date_key**
key name for annotation that compares manifest value with current date in the
YYYY-MM-DD format. Allowed pattern: `'([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]'`.
The default is empty which also disables this delete protection check.

* **delete_annotation_name_key**
key name for annotation that compares manifest value with Postgres cluster name.
Allowed pattern: `'([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]'`. The default is
empty which also disables this delete protection check.

* **pod_service_account_name**
service account used by Patroni running on individual Pods to communicate
with the operator. Required even if native Kubernetes support in Patroni is
Expand Down Expand Up @@ -293,16 +318,6 @@ configuration they are grouped under the `kubernetes` key.
of a database created by the operator. If the annotation key is also provided
by the database definition, the database definition value is used.

* **delete_annotation_date_key**
key name for annotation that compares manifest value with current date in the
YYYY-MM-DD format. Allowed pattern: `'([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]'`.
The default is empty which also disables this delete protection check.

* **delete_annotation_name_key**
key name for annotation that compares manifest value with Postgres cluster name.
Allowed pattern: `'([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]'`. The default is
empty which also disables this delete protection check.

* **downscaler_annotations**
An array of annotations that should be passed from Postgres CRD on to the
statefulset and, if exists, to the connection pooler deployment as well.
Expand Down Expand Up @@ -332,20 +347,6 @@ configuration they are grouped under the `kubernetes` key.
drained if the node_readiness_label is not used. If this option if set to
`false` the `spilo-role=master` selector will not be added to the PDB.

* **enable_finalizers**
By default, a deletion of the Postgresql resource will trigger an event
that leads to a cleanup of all child resources. However, if the database
cluster is in a broken state (e.g. failed initialization) and the operator
cannot fully sync it, there can be leftovers. By enabling finalizers the
operator will ensure all managed resources are deleted prior to the
Postgresql resource. There is a trade-off though: The deletion is only
performed after the next two SYNC cycles with the first one updating the
internal spec and the latter reacting on the `deletionTimestamp` while
processing the SYNC event. The final removal of the custom resource will
add a DELETE event to the worker queue but the child resources are already
gone at this point.
The default is `false`.

* **persistent_volume_claim_retention_policy**
The operator tries to protect volumes as much as possible. If somebody
accidentally deletes the statefulset or scales in the `numberOfInstances` the
Expand Down
109 changes: 100 additions & 9 deletions e2e/tests/test_e2e.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ def setUpClass(cls):
print("Failed to delete the 'standard' storage class: {0}".format(e))

# operator deploys pod service account there on start up
# needed for test_multi_namespace_support()
# needed for test_multi_namespace_support and test_owner_references
cls.test_namespace = "test"
try:
v1_namespace = client.V1Namespace(metadata=client.V1ObjectMeta(name=cls.test_namespace))
Expand Down Expand Up @@ -1352,17 +1352,11 @@ def test_multi_namespace_support(self):
k8s.wait_for_pod_start("spilo-role=master", self.test_namespace)
k8s.wait_for_pod_start("spilo-role=replica", self.test_namespace)
self.assert_master_is_unique(self.test_namespace, "acid-test-cluster")
# acid-test-cluster will be deleted in test_owner_references test

except timeout_decorator.TimeoutError:
print('Operator log: {}'.format(k8s.get_operator_log()))
raise
finally:
# delete the new cluster so that the k8s_api.get_operator_state works correctly in subsequent tests
# ideally we should delete the 'test' namespace here but
# the pods inside the namespace stuck in the Terminating state making the test time out
k8s.api.custom_objects_api.delete_namespaced_custom_object(
"acid.zalan.do", "v1", self.test_namespace, "postgresqls", "acid-test-cluster")
time.sleep(5)

@timeout_decorator.timeout(TEST_TIMEOUT_SEC)
@unittest.skip("Skipping this test until fixed")
Expand Down Expand Up @@ -1573,6 +1567,71 @@ def test_overwrite_pooler_deployment(self):
self.eventuallyEqual(lambda: k8s.count_running_pods("connection-pooler="+pooler_name),
0, "Pooler pods not scaled down")

@timeout_decorator.timeout(TEST_TIMEOUT_SEC)
def test_owner_references(self):
'''
Enable owner references, test if resources get updated and test cascade deletion of test cluster.
'''
k8s = self.k8s
cluster_name = 'acid-test-cluster'
cluster_label = 'application=spilo,cluster-name={}'.format(cluster_name)
default_test_cluster = 'acid-minimal-cluster'

try:
# enable owner references in config
enable_owner_refs = {
"data": {
"enable_owner_references": "true"
}
}
k8s.update_config(enable_owner_refs)
self.eventuallyEqual(lambda: k8s.get_operator_state(), {"0": "idle"}, "Operator does not get in sync")

time.sleep(5) # wait for the operator to sync the cluster and update resources

# check if child resources were updated with owner references
self.assertTrue(self.check_cluster_child_resources_owner_references(cluster_name, self.test_namespace), "Owner references not set on all child resources of {}".format(cluster_name))
self.assertTrue(self.check_cluster_child_resources_owner_references(default_test_cluster), "Owner references not set on all child resources of {}".format(default_test_cluster))

# delete the new cluster to test owner references
# and also to make k8s_api.get_operator_state work better in subsequent tests
# ideally we should delete the 'test' namespace here but the pods
# inside the namespace stuck in the Terminating state making the test time out
k8s.api.custom_objects_api.delete_namespaced_custom_object(
"acid.zalan.do", "v1", self.test_namespace, "postgresqls", cluster_name)

# statefulset, pod disruption budget and secrets should be deleted via owner reference
self.eventuallyEqual(lambda: k8s.count_pods_with_label(cluster_label), 0, "Pods not deleted")
self.eventuallyEqual(lambda: k8s.count_statefulsets_with_label(cluster_label), 0, "Statefulset not deleted")
self.eventuallyEqual(lambda: k8s.count_pdbs_with_label(cluster_label), 0, "Pod disruption budget not deleted")
self.eventuallyEqual(lambda: k8s.count_secrets_with_label(cluster_label), 0, "Secrets were not deleted")

time.sleep(5) # wait for the operator to also delete the leftovers

# pvcs and Patroni config service/endpoint should not be affected by owner reference
# but deleted by the operator almost immediately
self.eventuallyEqual(lambda: k8s.count_pvcs_with_label(cluster_label), 0, "PVCs not deleted")
self.eventuallyEqual(lambda: k8s.count_services_with_label(cluster_label), 0, "Patroni config service not deleted")
self.eventuallyEqual(lambda: k8s.count_endpoints_with_label(cluster_label), 0, "Patroni config endpoint not deleted")

# disable owner references in config
disable_owner_refs = {
"data": {
"enable_owner_references": "false"
}
}
k8s.update_config(disable_owner_refs)
self.eventuallyEqual(lambda: k8s.get_operator_state(), {"0": "idle"}, "Operator does not get in sync")

time.sleep(5) # wait for the operator to remove owner references

# check if child resources were updated without Postgresql owner references
self.assertTrue(self.check_cluster_child_resources_owner_references(default_test_cluster, "default", True), "Owner references still present on some child resources of {}".format(default_test_cluster))

except timeout_decorator.TimeoutError:
print('Operator log: {}'.format(k8s.get_operator_log()))
raise

@timeout_decorator.timeout(TEST_TIMEOUT_SEC)
def test_password_rotation(self):
'''
Expand Down Expand Up @@ -1771,7 +1830,6 @@ def test_rolling_update_flag(self):
replica = k8s.get_cluster_replica_pod()
self.assertTrue(replica.metadata.creation_timestamp > old_creation_timestamp, "Old master pod was not recreated")


except timeout_decorator.TimeoutError:
print('Operator log: {}'.format(k8s.get_operator_log()))
raise
Expand Down Expand Up @@ -2202,6 +2260,39 @@ def assert_distributed_pods(self, target_nodes, cluster_labels='cluster-name=aci

return True

def check_cluster_child_resources_owner_references(self, cluster_name, cluster_namespace='default', inverse=False):
k8s = self.k8s

# check if child resources were updated with owner references
sset = k8s.api.apps_v1.read_namespaced_stateful_set(cluster_name, cluster_namespace)
self.assertTrue(self.has_postgresql_owner_reference(sset.metadata.owner_references, inverse), "statefulset owner reference check failed")

svc = k8s.api.core_v1.read_namespaced_service(cluster_name, cluster_namespace)
self.assertTrue(self.has_postgresql_owner_reference(svc.metadata.owner_references, inverse), "primary service owner reference check failed")
replica_svc = k8s.api.core_v1.read_namespaced_service(cluster_name + "-repl", cluster_namespace)
self.assertTrue(self.has_postgresql_owner_reference(replica_svc.metadata.owner_references, inverse), "replica service owner reference check failed")

ep = k8s.api.core_v1.read_namespaced_endpoints(cluster_name, cluster_namespace)
self.assertTrue(self.has_postgresql_owner_reference(ep.metadata.owner_references, inverse), "primary endpoint owner reference check failed")
replica_ep = k8s.api.core_v1.read_namespaced_endpoints(cluster_name + "-repl", cluster_namespace)
self.assertTrue(self.has_postgresql_owner_reference(replica_ep.metadata.owner_references, inverse), "replica owner reference check failed")

pdb = k8s.api.policy_v1.read_namespaced_pod_disruption_budget("postgres-{}-pdb".format(cluster_name), cluster_namespace)
self.assertTrue(self.has_postgresql_owner_reference(pdb.metadata.owner_references, inverse), "pod disruption owner reference check failed")

pg_secret = k8s.api.core_v1.read_namespaced_secret("postgres.{}.credentials.postgresql.acid.zalan.do".format(cluster_name), cluster_namespace)
self.assertTrue(self.has_postgresql_owner_reference(pg_secret.metadata.owner_references, inverse), "postgres secret owner reference check failed")
standby_secret = k8s.api.core_v1.read_namespaced_secret("standby.{}.credentials.postgresql.acid.zalan.do".format(cluster_name), cluster_namespace)
self.assertTrue(self.has_postgresql_owner_reference(standby_secret.metadata.owner_references, inverse), "standby secret owner reference check failed")

return True

def has_postgresql_owner_reference(self, owner_references, inverse):
if inverse:
return owner_references is None or owner_references[0].kind != 'postgresql'

return owner_references is not None and owner_references[0].kind == 'postgresql' and owner_references[0].controller

def list_databases(self, pod_name):
'''
Get list of databases we might want to iterate over
Expand Down
3 changes: 2 additions & 1 deletion manifests/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ data:
enable_master_pooler_load_balancer: "false"
enable_password_rotation: "false"
enable_patroni_failsafe_mode: "false"
enable_secrets_deletion: "true"
enable_owner_references: "false"
enable_persistent_volume_claim_deletion: "true"
enable_pgversion_env_var: "true"
# enable_pod_antiaffinity: "false"
Expand All @@ -59,6 +59,7 @@ data:
enable_readiness_probe: "false"
enable_replica_load_balancer: "false"
enable_replica_pooler_load_balancer: "false"
enable_secrets_deletion: "true"
# enable_shm_volume: "true"
# enable_sidecars: "true"
enable_spilo_wal_path_compat: "true"
Expand Down
2 changes: 2 additions & 0 deletions manifests/operator-service-account-rbac-openshift.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ rules:
- create
- delete
- get
- patch
- update
# to check nodes for node readiness label
- apiGroups:
Expand Down Expand Up @@ -166,6 +167,7 @@ rules:
- get
- list
- patch
- update
# to CRUD cron jobs for logical backups
- apiGroups:
- batch
Expand Down
Loading