Skip to content

Commit b426a72

Browse files
ddhodgeaishwarya24
andauthored
[doc][yba][DOC-503][2024.2] Updates to YBA xCluster (yugabyte#24847)
* updates to YBA xCluster * misc edits * format * review comments * Apply suggestions from code review Co-authored-by: Aishwarya Chakravarthy <achakravarthy@yugabyte.com> * review comments * review comment * copy to preview * fix stable --------- Co-authored-by: Aishwarya Chakravarthy <achakravarthy@yugabyte.com>
1 parent 39b28c2 commit b426a72

File tree

13 files changed

+112
-67
lines changed

13 files changed

+112
-67
lines changed

docs/content/preview/yugabyte-platform/back-up-restore-universes/disaster-recovery/_index.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -66,11 +66,7 @@ Blog: [Using YugabyteDB xCluster DR for PostgreSQL Disaster Recovery in Azure](h
6666

6767
- Currently, replication of DDL (SQL-level changes such as creating or dropping tables or indexes) is not supported. To make these changes requires first performing the DDL operation (for example, creating a table), and then adding the new object to replication in YugabyteDB Anywhere. In addition, xCluster does not support truncate operations. Refer to [Manage tables and indexes](./disaster-recovery-tables/).
6868

69-
- DR setup (and other operations that require making a full copy from DR primary to DR replica, such as adding tables with data to replication, resuming replication after an extended network outage, and so on) may fail with the error `database "<database_name>" is being accessed by other users`.
70-
71-
This happens because the operation relies on a backup and restore of the database, and the restore will fail if there are any open connections to the DR replica.
72-
73-
To fix this, close any open SQL connections to the DR replica, delete the DR configuration, and perform the operation again.
69+
- If a database operation requires a full copy, any application sessions on the database on the DR target will be interrupted while the database is dropped and recreated. Your application should either retry connections or redirect reads to the DR primary.
7470

7571
- Setting up DR between a universe upgraded to v2.20.x and a new v2.20.x universe is not supported. This is due to a limitation of xCluster deployments and packed rows. See [Packed row limitations](../../../architecture/docdb/packed-rows/#limitations).
7672

docs/content/preview/yugabyte-platform/back-up-restore-universes/disaster-recovery/disaster-recovery-failover.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ If the DR primary is terminated for some reason, do the following:
2222

2323
1. Stop the application traffic to ensure no more updates are attempted.
2424

25-
1. Navigate to your DR primary universe and select **xCluster Disaster Recovery**.
25+
1. Navigate to your DR primary universe **xCluster Disaster Recovery** tab and select the replication configuration.
2626

2727
1. Note the **Potential data loss on failover** to understand the extent of possible data loss as a result of the outage, and determine if the extent of data loss is acceptable for your situation.
2828

@@ -54,7 +54,7 @@ In both cases, repairing DR involves making a full copy of the databases through
5454

5555
To repair DR, do the following:
5656

57-
1. Navigate to your (new) DR primary universe and select **xCluster Disaster Recovery**.
57+
1. Navigate to your (new) DR primary universe **xCluster Disaster Recovery** tab and select the replication configuration.
5858

5959
1. Click **Repair DR** to display the **Repair DR** dialog.
6060

docs/content/preview/yugabyte-platform/back-up-restore-universes/disaster-recovery/disaster-recovery-switchover.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,9 @@ Switchover can be used by enterprises when performing regular business continuit
2020

2121
First, confirm there is no excessive lag between the DR primary and replica. You can [monitor lag](../disaster-recovery-setup/#monitor-replication) on the **xCluster Disaster Recovery** tab.
2222

23-
If the DR config has any tables that don't have a replication status of Operational, switchover will be unsuccessful. In that case, you can do one of the following:
23+
While the switchover task is in progress, both universes are in read-only mode and reject write operations.
24+
25+
If the DR configuration has any tables that don't have a replication status of Operational, switchover will be unsuccessful. In that case, you can do one of the following:
2426

2527
- Perform a full copy from the DR primary to the DR replica.
2628
- [Unplanned Failover](../disaster-recovery-failover/).
@@ -33,7 +35,7 @@ Use the following steps to perform a planned switchover:
3335

3436
1. Stop the application traffic on the DR primary.
3537

36-
1. Navigate to your DR primary universe and select **xCluster Disaster Recovery**.
38+
1. Navigate to your DR primary universe **xCluster Disaster Recovery** tab and select the replication configuration.
3739

3840
1. Click **Actions** and choose **Switchover**.
3941

docs/content/preview/yugabyte-platform/back-up-restore-universes/disaster-recovery/disaster-recovery-tables.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ Add tables to DR in the following sequence:
5252

5353
1. Create the table on the DR primary (if it doesn't already exist).
5454
1. Create the table on the DR replica.
55-
1. Navigate to your DR primary and select **xCluster Disaster Recovery**.
55+
1. Navigate to your DR primary universe **xCluster Disaster Recovery** tab and select the replication configuration.
5656
1. Click **Actions** and choose **Select Databases and Tables**.
5757
1. Select the tables and click **Validate Selection**.
5858
1. If data needs to be copied, click **Next: Confirm Full Copy**.
@@ -74,7 +74,7 @@ When dropping a table, remove the table from DR before dropping the table in the
7474

7575
Remove tables from DR in the following sequence:
7676

77-
1. Navigate to your DR primary and select **xCluster Disaster Recovery**.
77+
1. Navigate to your DR primary universe **xCluster Disaster Recovery** tab and select the replication configuration.
7878
1. Click **Actions** and choose **Select Databases and Tables**.
7979
1. Deselect the tables and click **Validate Selection**.
8080
1. Click **Apply Changes**.
@@ -157,5 +157,5 @@ To remove a table partition from DR, follow the same steps as [Remove a table fr
157157

158158
To ensure changes made outside of YugabyteDB Anywhere are reflected in YugabyteDB Anywhere, you need to reconcile the configuration as follows:
159159

160-
1. In YugabyteDB Anywhere, navigate to your DR primary and select **xCluster Disaster Recovery**.
160+
1. Navigate to your DR primary universe **xCluster Disaster Recovery** tab and select the replication configuration.
161161
1. Click **Actions > Advanced** and choose **Reconcile Config with Database**.

docs/content/preview/yugabyte-platform/manage-deployments/xcluster-replication/xcluster-replication-setup.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ type: docs
1414

1515
## Prerequisites
1616

17+
To set up or configure xCluster replication, you must be a Super Admin or Admin, or have a role with the Manage xCluster permission. For information on roles, refer to [Manage users](../../../administer-yugabyte-platform/anywhere-rbac/).
18+
1719
Create the source and target universes for replication.
1820

1921
Ensure the universes have the following characteristics:
@@ -215,7 +217,14 @@ To create an alert:
215217

216218
1. Click **Save** when you are done.
217219

218-
When replication is set up, YugabyteDB automatically creates an alert for _YSQL Tables in DR/xCluster Config Inconsistent With Primary/Source_. This alert fires when tables are added or dropped from source's databases under replication, but are not yet added or dropped from the YugabyteDB Anywhere replication configuration.
220+
When replication is set up, YugabyteDB automatically creates the alert _XCluster Config Tables are in bad state_. This alert fires when:
221+
222+
- there is a table schema mismatch between DR primary and replica
223+
- tables are added or dropped from either DR primary or replica, but have not been added or dropped from the other.
224+
225+
When you receive an alert, navigate to the replication configuration [Tables tab](#tables) to see the table status.
226+
227+
YugabyteDB Anywhere collects these metrics every 2 minutes, and fires the alert within 10 minutes of the error.
219228

220229
For more information on alerting in YugabyteDB Anywhere, refer to [Alerts](../../../alerts-monitoring/alert/).
221230

docs/content/stable/yugabyte-platform/back-up-restore-universes/disaster-recovery/disaster-recovery-switchover.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,9 @@ Switchover can be used by enterprises when performing regular business continuit
2020

2121
First, confirm there is no excessive lag between the DR primary and replica. You can [monitor lag](../disaster-recovery-setup/#monitor-replication) on the **xCluster Disaster Recovery** tab.
2222

23-
If the DR config has any tables that don't have a replication status of Operational, switchover will be unsuccessful. In that case, you can do one of the following:
23+
While the switchover task is in progress, both universes are in read-only mode and reject write operations.
24+
25+
If the DR configuration has any tables that don't have a replication status of Operational, switchover will be unsuccessful. In that case, you can do one of the following:
2426

2527
- Perform a full copy from the DR primary to the DR replica.
2628
- [Unplanned Failover](../disaster-recovery-failover/).

docs/content/v2024.2/yugabyte-platform/alerts-monitoring/alert-policy-templates.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,15 @@ Last snapshot task for universe `'$universe_name'` failed. To retry, check PITR
8383
min(ybp_pitr_config_status{universe_uuid = "__universeUuid__"}) {{ query_condition }} 1
8484
```
8585

86+
#### xCluster config tables are in bad state
87+
88+
There are issues with table replication. For example, there is a table schema mismatch between primary and replica universes in xCluster replication or xCluster DR; or tables were added or dropped from either primary or replica, but have not been added or dropped from the other; or there was an extended network partition.
89+
90+
```promql
91+
min(last_over_time(ybp_xcluster_table_status{source_universe_uuid
92+
= "__universeUuid__"}[5m])) {{ query_condition }} {{ query_threshold }}
93+
```
94+
8695
### DB templates
8796

8897
#### DB compaction overload
@@ -659,14 +668,6 @@ YSQLSH connection failure detected for universe `'$universe_name'` on `$value` T
659668
count by (universe_uuid) (yb_node_ysql_connect{universe_uuid="__universeUuid__"} < 1) {{ query_condition }} {{ query_threshold }}
660669
```
661670

662-
#### New YSQL tables added
663-
664-
New YSQL tables are added to the source universe `'$universe_name'` in the database with an existing xCluster configuration, but not added to the xCluster replication.
665-
666-
```promql
667-
((count by (namespace_name, universe_uuid)(count by(namespace_name, table_id, universe_uuid)(rocksdb_current_version_sst_files_size{universe_uuid="__universeUuid__",table_type="PGSQL_TABLE_TYPE"}))) - count by(namespace_name, universe_uuid)(count by(namespace_name, universe_uuid, table_id)(async_replication_sent_lag_micros{universe_uuid="__universeUuid__",table_type="PGSQL_TABLE_TYPE"}))) {{ query_condition }} {{ query_threshold }}
668-
```
669-
670671
#### Number of YSQL connections is high
671672

672673
Number of YSQL connections for universe `'$universe_name'` is above `$threshold`. Current value is `$value`.

docs/content/v2024.2/yugabyte-platform/back-up-restore-universes/disaster-recovery/_index.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -66,11 +66,7 @@ Blog: [Using YugabyteDB xCluster DR for PostgreSQL Disaster Recovery in Azure](h
6666

6767
- Currently, replication of DDL (SQL-level changes such as creating or dropping tables or indexes) is not supported. To make these changes requires first performing the DDL operation (for example, creating a table), and then adding the new object to replication in YugabyteDB Anywhere. In addition, xCluster does not support truncate operations. Refer to [Manage tables and indexes](./disaster-recovery-tables/).
6868

69-
- DR setup (and other operations that require making a full copy from DR primary to DR replica, such as adding tables with data to replication, resuming replication after an extended network outage, and so on) may fail with the error `database "<database_name>" is being accessed by other users`.
70-
71-
This happens because the operation relies on a backup and restore of the database, and the restore will fail if there are any open connections to the DR replica.
72-
73-
To fix this, close any open SQL connections to the DR replica, delete the DR configuration, and perform the operation again.
69+
- If a database operation requires a full copy, any application sessions on the database on the DR target will be interrupted while the database is dropped and recreated. Your application should either retry connections or redirect reads to the DR primary.
7470

7571
- Setting up DR between a universe upgraded to v2.20.x and a new v2.20.x universe is not supported. This is due to a limitation of xCluster deployments and packed rows. See [Packed row limitations](../../../architecture/docdb/packed-rows/#limitations).
7672

docs/content/v2024.2/yugabyte-platform/back-up-restore-universes/disaster-recovery/disaster-recovery-failover.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ If the DR primary is terminated for some reason, do the following:
2222

2323
1. Stop the application traffic to ensure no more updates are attempted.
2424

25-
1. Navigate to your DR primary universe and select **xCluster Disaster Recovery**.
25+
1. Navigate to your DR primary universe **xCluster Disaster Recovery** tab and select the replication configuration.
2626

2727
1. Note the **Potential data loss on failover** to understand the extent of possible data loss as a result of the outage, and determine if the extent of data loss is acceptable for your situation.
2828

@@ -54,7 +54,7 @@ In both cases, repairing DR involves making a full copy of the databases through
5454

5555
To repair DR, do the following:
5656

57-
1. Navigate to your (new) DR primary universe and select **xCluster Disaster Recovery**.
57+
1. Navigate to your (new) DR primary universe **xCluster Disaster Recovery** tab and select the replication configuration.
5858

5959
1. Click **Repair DR** to display the **Repair DR** dialog.
6060

0 commit comments

Comments
 (0)