Skip to content

Commit 04e6c6b

Browse files
authored
Merge pull request #380 from tinybirdco/readme-updates-377
Readme updates for backfill #377
2 parents 57a30a9 + df55318 commit 04e6c6b

File tree

1 file changed

+17
-7
lines changed
  • change_column_type_materialized_view

1 file changed

+17
-7
lines changed

change_column_type_materialized_view/README.md

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ To change a column type in a Materialized View Data Source is a process that nee
66

77
This change needs to re-create the Materialized View and populate it again with all the data without stoping our ingestion.
88

9-
For that the steps will be:
9+
For that, the steps will be:
1010

11-
1. Create a new Materialized View (Pipe and Data Source) to change the type to the colum.
11+
1. Create a new Materialized View (Pipe and Data Source) to change the type to the column.
1212
2. Run CI.
1313
3. Backfill the new Materialized View with the data previous to its creation.
1414
4. Run CD and run the backfill in the main Workspace.
@@ -48,7 +48,7 @@ Create a Copy Pipe `analytics_pages_backfill.pipe` for backfilling purposes:
4848
NODE analytics_pages_backfill_node
4949
5050
SQL >
51-
51+
%
5252
SELECT
5353
toDate(timestamp) AS date,
5454
device,
@@ -67,14 +67,24 @@ SQL >
6767
pathname
6868
6969
TYPE COPY
70-
DATASOURCE analytics_pages_mv_1
70+
TARGET_DATASOURCE analytics_pages_mv_1
7171
```
7272

7373
## 2: Run CI
7474

7575
Make sure the changes are deployed correctly in the CI Tinybird Branch. Optionally you can add automated tests or verify it from the `tmp_ci_*` Branch created as part of the CI pipeline.
7676

77-
## 3: Backfilling
77+
## 3: (For large datasets) Splitting the Data into Chunks for Backfilling
78+
79+
If your data source is large, you may run into a memory error like this:
80+
```
81+
error: "There was a problem while copying data: [Error] Memory limit (for query) exceeded. Make sure the query just process the required data. Contact us at support@tinybird.co for help or read this SQL tip: https://tinybird.co/docs/guides/best-practices-for-faster-sql.html#memory-limit-reached-title"
82+
```
83+
84+
To avoid memory issues, you will need to break the backfill operation into smaller, manageable chunks. This approach reduces the memory load per query by processing only a subset of the data at a time. You can use the ***data source's sorting key*** to define each chunk.
85+
Refer to [this guide](https://www.tinybird.co/docs/work-with-data/strategies/backfill-strategies#scenario-3-streaming-ingestion-with-incremental-timestamp-column) for more details.
86+
87+
## 4: Backfilling
7888

7989
Wait for the first event to be ingested into `analytics_pages_mv_1` and then proceed with the backfilling.
8090

@@ -93,10 +103,10 @@ tb sql "select timestamp from tinybird.datasources_ops_log where event_type = 'c
93103
tb pipe copy run analytics_pages_backfill --node analytics_pages_backfill_node --param start_backfill_timestamp='2024-01-01 00:00:00' --param end_backfill_timestamp='$CREATED_AT' --wait --yes
94104
```
95105

96-
## 4: Run CD
106+
## 5: Run CD
97107

98108
Merge the PR and make sure to run the backfilling operation over the main Workspace
99109

100-
## 5: Connect the downstream dependencies
110+
## 6: Connect the downstream dependencies
101111

102112
Once the new Materialized View is created and synchronized you can create another Pull Request to start using it in your endpoints.

0 commit comments

Comments
 (0)