Skip to content

Conversation

@torcolvin
Copy link
Collaborator

@torcolvin torcolvin commented Nov 19, 2025

CBG-3690 don't re-read document for resync

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Integration Tests

Copilot AI review requested due to automatic review settings November 19, 2025 22:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request optimizes the document resync operation by eliminating unnecessary document re-reads from the bucket. The key changes improve performance by passing the document data directly from DCP feed events to the resync function, and add support for creating HLV (Hybrid Logical Vector) metadata during resync operations.

Key Changes

  • Modified resyncDocument to accept a *sgbucket.BucketDocument parameter instead of re-reading from the bucket
  • Added HLV creation logic to the resync path for documents that don't have HLV
  • Removed the non-xattr code pathway from resync operations

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
db/database.go Updated resyncDocument signature to accept bucket document directly; added HLV update logic during resync
db/background_mgr_resync_dcp.go Modified DCP callback to convert feed events to bucket documents and pass to resync
db/crud.go Extended HLV update logic to handle Resync case alongside Import
db/document.go Added bucketDocumentFromFeed and getBucketDocument helper functions; removed unused UnmarshalDocumentFromFeed
db/document_test.go Added test for getBucketDocument function
db/database_test.go Enhanced resync tests to verify HLV creation and metadata-only updates
base/constants.go Added VirtualExpiry constant for fetching document expiry

db/database.go Outdated
}
// Update MetadataOnlyUpdate based on previous Cas, MetadataOnlyUpdate
doc.MetadataOnlyUpdate = computeMetadataOnlyUpdate(doc.Cas, doc.RevSeqNo, doc.MetadataOnlyUpdate)
mouMatch := false // after writing this document, mou.cas != doc.cas since otherwise shouldUpdate would be false
Copy link

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment explains that mouMatch should be false, but doesn't clearly explain why shouldUpdate being false would make mou.cas == doc.cas. Consider clarifying that when shouldUpdate is false, no resync occurs and the function returns early via ErrUpdateCancel, so at this point shouldUpdate must be true.

Suggested change
mouMatch := false // after writing this document, mou.cas != doc.cas since otherwise shouldUpdate would be false
// At this point, shouldUpdate must be true, because if it were false, we would have already returned early via ErrUpdateCancel.
// Therefore, mou.cas != doc.cas, and mouMatch should be false.
mouMatch := false

Copilot uses AI. Check for mistakes.
@torcolvin
Copy link
Collaborator Author

When looking at this PR, evaluate this for safety to put into 4.0.2 to address CBG-5017 and obviate the need for #7890

}
return updatedDoc, err
_, rawSyncXattr, rawVvXattr, rawMouXattr, rawGlobalXattr, err := updatedDoc.MarshalWithXattrs()
updatedDoc := sgbucket.UpdatedDoc{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: This declaration here shadows the declaration on line 1827

db/database.go Outdated
// Update MetadataOnlyUpdate based on previous Cas, MetadataOnlyUpdate
doc.MetadataOnlyUpdate = computeMetadataOnlyUpdate(doc.Cas, doc.RevSeqNo, doc.MetadataOnlyUpdate)
mouMatch := false // after writing this document, mou.cas != doc.cas since otherwise shouldUpdate would be false
doc, err = db.updateHLV(ctx, doc, Resync, mouMatch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to just pass in false here given mouMatch is always false?

db/database.go Outdated
}
// Update MetadataOnlyUpdate based on previous Cas, MetadataOnlyUpdate
doc.MetadataOnlyUpdate = computeMetadataOnlyUpdate(doc.Cas, doc.RevSeqNo, doc.MetadataOnlyUpdate)
mouMatch := false // after writing this document, mou.cas != doc.cas since otherwise shouldUpdate would be false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this comment confusing as after this update has been completed mou.cas should be == to doc cas right?

@torcolvin torcolvin assigned torcolvin and unassigned gregns1 Nov 24, 2025
@torcolvin torcolvin assigned adamcfraser and unassigned torcolvin Nov 26, 2025
@torcolvin
Copy link
Collaborator Author

To consider:

Is the latest set of changes better than https://github.com/couchbase/sync_gateway/pull/7892/files/eb809bf36ded144a008de3741c629a11a999d827 ? Does it add more risk to the ticket to make it not worthwhile? The idea behind the latest set of changes is to avoid having to set mouMatch and moving the update to a single point in code. This means that we move

The refactor exposes something that is tricky which is that Document.RevSeqNo is present on the Document struct but only populated when an import is occurring, since we do not want to fetch an extra xattr if needed. This means the value is often 0, so I added code to check if Document.RevSeqNo = 0 and not write the mou in this case. An error here would be a developer problem.

I thought about if the right thing to do is to add RevSeqNo to sgbucket.BucketDocument which almost fixes the problem, except for on of the on demand import fields on write. Because documentUpdateFunc takes db.Document object, we need to be able to pass rev seq no in that function. That refactor became even bigger than the existing refactor that I did to consolidate mou handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants