-
Notifications
You must be signed in to change notification settings - Fork 637
Fix for PGO-2380: #4163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for PGO-2380: #4163
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| apiVersion: kuttl.dev/v1beta1 | ||
| kind: TestStep | ||
| apply: | ||
| - files/12--create-cluster.yaml | ||
| assert: | ||
| - files/12-cluster-created.yaml |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| apiVersion: kuttl.dev/v1beta1 | ||
| kind: TestAssert | ||
| commands: | ||
| # First, check that all containers in the instance pod are ready. | ||
| # Then, grab the collector metrics output and check that a postgres | ||
| # metric is present, as well as a patroni metric. | ||
| # Then, check the collector logs for patroni, and postgres logs. | ||
| # Finally, ensure the monitoring user exists and is configured. | ||
| - script: | | ||
| retry() { bash -ceu 'printf "$1\nSleeping...\n" && sleep 5' - "$@"; } | ||
| check_containers_ready() { bash -ceu 'echo "$1" | jq -e ".[] | select(.type==\"ContainersReady\") | .status==\"True\""' - "$@"; } | ||
| contains() { bash -ceu '[[ "$1" == *"$2"* ]]' - "$@"; } | ||
| pod=$(kubectl get pods -o name -n "${NAMESPACE}" \ | ||
| -l postgres-operator.crunchydata.com/cluster=otel-cluster-no-backups,postgres-operator.crunchydata.com/data=postgres) | ||
| [ "$pod" = "" ] && retry "Pod not found" && exit 1 | ||
| condition_json=$(kubectl get "${pod}" -n "${NAMESPACE}" -o jsonpath="{.status.conditions}") | ||
| [ "$condition_json" = "" ] && retry "conditions not found" && exit 1 | ||
| { check_containers_ready "$condition_json"; } || { | ||
| retry "containers not ready" | ||
| exit 1 | ||
| } | ||
| scrape_metrics=$(kubectl exec "${pod}" -c collector -n "${NAMESPACE}" -- \ | ||
| curl --insecure --silent http://localhost:9187/metrics) | ||
| { contains "${scrape_metrics}" 'ccp_connection_stats_active'; } || { | ||
| retry "5 second metric not found" | ||
| exit 1 | ||
| } | ||
| { contains "${scrape_metrics}" 'patroni_postgres_running'; } || { | ||
| retry "patroni metric not found" | ||
| exit 1 | ||
| } | ||
| logs=$(kubectl logs "${pod}" --namespace "${NAMESPACE}" -c collector | grep InstrumentationScope) | ||
| { contains "${logs}" 'InstrumentationScope patroni'; } || { | ||
| retry "patroni logs not found" | ||
| exit 1 | ||
| } | ||
| { contains "${logs}" 'InstrumentationScope postgres'; } || { | ||
| retry "postgres logs not found" | ||
| exit 1 | ||
| } | ||
| kubectl exec --stdin "${pod}" --namespace "${NAMESPACE}" -c database \ | ||
| -- psql -qb --set ON_ERROR_STOP=1 --file=- <<'SQL' | ||
| DO $$ | ||
| DECLARE | ||
| result record; | ||
| BEGIN | ||
| SELECT * INTO result FROM pg_catalog.pg_roles WHERE rolname = 'ccp_monitoring'; | ||
| ASSERT FOUND, 'user not found'; | ||
| END $$ | ||
| SQL | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| apiVersion: kuttl.dev/v1beta1 | ||
| kind: TestStep | ||
| apply: | ||
| - files/14--add-backups.yaml | ||
| assert: | ||
| - files/14-backups-added.yaml |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| apiVersion: kuttl.dev/v1beta1 | ||
| kind: TestStep | ||
| commands: | ||
| - command: |- | ||
| kubectl patch postgrescluster otel-cluster-no-backups --type 'merge' -p '{"spec":{"backups": null}}' | ||
| namespaced: true |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| apiVersion: kuttl.dev/v1beta1 | ||
| kind: TestStep | ||
| commands: | ||
| - command: kubectl annotate postgrescluster otel-cluster-no-backups postgres-operator.crunchydata.com/authorizeBackupRemoval="true" | ||
| namespaced: true | ||
| assert: | ||
| - files/16-backups-removed.yaml |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,6 +23,12 @@ This test assumes that the operator has both OpenTelemetryLogs and OpenTelemetry | |
| 4. Add an `otlp` exporter to both PostgresCluster and PGAdmin `instrumentation` specs and create a standalone OTel collector to receive data from our sidecar collectors. | ||
| 1. Ensure that the ConfigMap, Service, and Deployment for the standalone OTel collector come up and that the collector container is running and ready. | ||
| 2. Assert that the standalone collector is receiving logs from all of our components (i.e. the standalone collector is getting logs for postgres, patroni, pgbackrest, pgbouncer, pgadmin, and gunicorn). | ||
| 5. Create a new cluster with `instrumentation` spec in place, but no `backups` spec to test the OTel features with optional backups. | ||
| 1. Ensure that the cluster comes up and the database and collector containers are running and ready. | ||
| 2. Add a backups spec to the new cluster and ensure that pgbackrest is added to the instance pod, a repo-host pod is created, and the collector runs on both pods. | ||
| 3. Remove the backups spec from the new cluster. | ||
| 4. Annotate the cluster to allow backups to be removed. | ||
| 5. Ensure that the repo-host pod is destroyed, pgbackrest is removed from the instance pod, and the collector continues to run on the instance pod. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am curious -- if you skip steps 3 and 4, will 5 fail?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure I follow... If you don't remove the backups spec and annotate the cluster, pgbackrest will just continue to run (in the instance pod and repo-host)...
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right -- I'm wondering if step 5 tests what we want it to test.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, first we create a brand new cluster that does not have backups enabled. And we make sure that the collector runs in that scenario. Then we add backups and ensure that everything works properly with that transition. Then we remove backups and ensure that everything works properly with that transition... What scenario do you think we aren't testing? Or what is it about the test that is insufficient?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's been a while since I did a KUTTL test, but I remember running into issues where I asserted that X existed, but it actually ignored what wasn't X. So here, we assert there's a few containers, but not the collector container -- but will it fail if that collector container does exist? Or will it just skip that?
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ahh, I think I understand now... In the assert step we tell it to check the instance pod for specific containers and it will fail if the list is not exactly correct (if we ask about 3 containers, but there are actually 5 containers that include the 3 we ask about, it will fail).
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, it checks if the list (of containers, etc.) is complete still? I think we (and others) were pushing to get it to check only for the presence of the items in the list. |
||
|
|
||
| ### NOTES | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| --- | ||
| apiVersion: postgres-operator.crunchydata.com/v1beta1 | ||
| kind: PostgresCluster | ||
| metadata: | ||
| name: otel-cluster-no-backups | ||
| spec: | ||
| postgresVersion: ${KUTTL_PG_VERSION} | ||
| instances: | ||
| - name: instance1 | ||
| dataVolumeClaimSpec: | ||
| accessModes: | ||
| - "ReadWriteOnce" | ||
| resources: | ||
| requests: | ||
| storage: 1Gi | ||
| instrumentation: {} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| apiVersion: postgres-operator.crunchydata.com/v1beta1 | ||
| kind: PostgresCluster | ||
| metadata: | ||
| name: otel-cluster-no-backups | ||
| status: | ||
| instances: | ||
| - name: instance1 | ||
| readyReplicas: 1 | ||
| replicas: 1 | ||
| updatedReplicas: 1 | ||
| --- | ||
| apiVersion: v1 | ||
| kind: Pod | ||
| metadata: | ||
| labels: | ||
| postgres-operator.crunchydata.com/data: postgres | ||
| postgres-operator.crunchydata.com/role: master | ||
| postgres-operator.crunchydata.com/cluster: otel-cluster-no-backups | ||
| postgres-operator.crunchydata.com/crunchy-otel-collector: "true" | ||
| status: | ||
| containerStatuses: | ||
| - name: collector | ||
| ready: true | ||
| started: true | ||
| - name: database | ||
| ready: true | ||
| started: true | ||
| - name: replication-cert-copy | ||
| ready: true | ||
| started: true | ||
| phase: Running | ||
| --- | ||
| apiVersion: v1 | ||
| kind: Service | ||
| metadata: | ||
| name: otel-cluster-no-backups-primary |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,31 @@ | ||
| --- | ||
| apiVersion: postgres-operator.crunchydata.com/v1beta1 | ||
| kind: PostgresCluster | ||
| metadata: | ||
| name: otel-cluster-no-backups | ||
| spec: | ||
| postgresVersion: ${KUTTL_PG_VERSION} | ||
| instances: | ||
| - name: instance1 | ||
| dataVolumeClaimSpec: | ||
| accessModes: | ||
| - "ReadWriteOnce" | ||
| resources: | ||
| requests: | ||
| storage: 1Gi | ||
| backups: | ||
| pgbackrest: | ||
| manual: | ||
| repoName: repo1 | ||
| options: | ||
| - --type=diff | ||
| repos: | ||
| - name: repo1 | ||
| volume: | ||
| volumeClaimSpec: | ||
| accessModes: | ||
| - "ReadWriteOnce" | ||
| resources: | ||
| requests: | ||
| storage: 1Gi | ||
| instrumentation: {} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| apiVersion: postgres-operator.crunchydata.com/v1beta1 | ||
| kind: PostgresCluster | ||
| metadata: | ||
| name: otel-cluster-no-backups | ||
| status: | ||
| instances: | ||
| - name: instance1 | ||
| readyReplicas: 1 | ||
| replicas: 1 | ||
| updatedReplicas: 1 | ||
| --- | ||
| apiVersion: v1 | ||
| kind: Pod | ||
| metadata: | ||
| labels: | ||
| postgres-operator.crunchydata.com/data: postgres | ||
| postgres-operator.crunchydata.com/role: master | ||
| postgres-operator.crunchydata.com/cluster: otel-cluster-no-backups | ||
| postgres-operator.crunchydata.com/crunchy-otel-collector: "true" | ||
| status: | ||
| containerStatuses: | ||
| - name: collector | ||
| ready: true | ||
| started: true | ||
| - name: database | ||
| ready: true | ||
| started: true | ||
| - name: pgbackrest | ||
| ready: true | ||
| started: true | ||
| - name: pgbackrest-config | ||
| ready: true | ||
| started: true | ||
| - name: replication-cert-copy | ||
| ready: true | ||
| started: true | ||
| phase: Running | ||
| --- | ||
| apiVersion: v1 | ||
| kind: Pod | ||
| metadata: | ||
| labels: | ||
| postgres-operator.crunchydata.com/data: pgbackrest | ||
| postgres-operator.crunchydata.com/cluster: otel-cluster-no-backups | ||
| postgres-operator.crunchydata.com/crunchy-otel-collector: "true" | ||
| status: | ||
| containerStatuses: | ||
| - name: collector | ||
| ready: true | ||
| started: true | ||
| - name: pgbackrest | ||
| ready: true | ||
| started: true | ||
| - name: pgbackrest-config | ||
| ready: true | ||
| started: true | ||
| phase: Running | ||
| --- | ||
| apiVersion: batch/v1 | ||
| kind: Job | ||
| metadata: | ||
| labels: | ||
| postgres-operator.crunchydata.com/cluster: otel-cluster-no-backups | ||
| postgres-operator.crunchydata.com/pgbackrest-backup: replica-create | ||
| status: | ||
| succeeded: 1 | ||
| --- | ||
| apiVersion: v1 | ||
| kind: Service | ||
| metadata: | ||
| name: otel-cluster-no-backups-primary |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| apiVersion: postgres-operator.crunchydata.com/v1beta1 | ||
| kind: PostgresCluster | ||
| metadata: | ||
| name: otel-cluster-no-backups | ||
| status: | ||
| instances: | ||
| - name: instance1 | ||
| readyReplicas: 1 | ||
| replicas: 1 | ||
| updatedReplicas: 1 | ||
| --- | ||
| apiVersion: v1 | ||
| kind: Pod | ||
| metadata: | ||
| labels: | ||
| postgres-operator.crunchydata.com/data: postgres | ||
| postgres-operator.crunchydata.com/role: master | ||
| postgres-operator.crunchydata.com/cluster: otel-cluster-no-backups | ||
| postgres-operator.crunchydata.com/crunchy-otel-collector: "true" | ||
| status: | ||
| containerStatuses: | ||
| - name: collector | ||
| ready: true | ||
| started: true | ||
| - name: database | ||
| ready: true | ||
| started: true | ||
| - name: replication-cert-copy | ||
| ready: true | ||
| started: true | ||
| phase: Running | ||
| --- | ||
| apiVersion: v1 | ||
| kind: Service | ||
| metadata: | ||
| name: otel-cluster-no-backups-primary |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ What is the retry mechanism? It looks like the retry function just prints a message and sleeps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, "retry" is a bit of a misnomer -- Joseph and I talked about this when he introduced the mechanism. As you point out, the
retryjust print and sleeps; the actual retrying is because(a) this exits 1 and
(b) it's a TestAssert
In KUTTL, failed TestAsserts automatically retry a number of times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Thanks!