Skip to content

Conversation

@bbrks
Copy link
Member

@bbrks bbrks commented Nov 19, 2025

CBG-5006

Have locks for each part of ServerContext instead of a shared mutex - avoids R/W deadlocks between HTTP Server list and Databases and reduces scope of each lock.

Integration Tests

Copilot AI review requested due to automatic review settings November 19, 2025 16:06
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the ServerContext struct to use separate mutexes for different functional areas instead of a single shared mutex. The changes split the previous lock into _databasesLock (for database-related maps) and _httpServersLock (for HTTP server management), reducing lock contention and preventing potential read/write deadlocks.

Key changes:

  • Replaced single lock with _databasesLock and _httpServersLock
  • Renamed all database-related maps to use underscore prefix (_databases, _dbConfigs, _dbRegistry, _collectionRegistry)
  • Updated all lock acquisitions throughout the codebase to use the appropriate specialized lock

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
rest/server_context.go Core changes to ServerContext struct definition, splitting the single lock into two specialized locks and renaming internal maps with underscore prefixes
rest/utilities_testing.go Updated test helper functions to use _databasesLock instead of generic lock
rest/serverless_test.go Updated test assertions to reference renamed database maps with underscore prefixes
rest/handler_config_database.go Updated database configuration mutation handler to use _databasesLock
rest/config_test.go Updated test assertions to use renamed _dbConfigs map
rest/config.go Updated all database configuration operations to use _databasesLock and renamed map references
rest/api_test.go Updated direct test access to renamed _dbConfigs map
rest/admin_api.go Updated admin API handlers to use _databasesLock and renamed database maps

@bbrks bbrks requested a review from Copilot November 19, 2025 18:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Copy link
Collaborator

@torcolvin torcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a lot of comments about paranoia about grabbing the lock but possibly panicking or not letting go of the lock because I believe the issue that spawned this ticket had a ServerContext.lock.Lock() that was not unlocked but no longer existed in the stack trace that we had.

I think that fixing these issues can be deferred to a separate ticket and some cases were really extra paranoid and probably don't need to be addressed.

@@ -2990,7 +2989,7 @@
resp = rt.CreateDatabase(dbName, dbConfig)
RequireStatus(t, resp, http.StatusCreated)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line should be a lock but actually i'd just change the test so it makes a different REST api call

I don't want to accidentally have the race detector pick up this error because we are doing something naughty in the tests and config polling runs simultaneously.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test case isn't quite a regular /db/_config operation - it's effectively doing a persistent config load of an old configuration - bypassing the upfront validation that /db/_config would've run.

Either way, I can move it to a function to wrap in a read mutex (the read lock is required to read the dbName entry, but nothing more to update the underlying configuration)

rest/api_test.go Outdated

// Attempt to set the `AllowConflicts` property to true in the database configuration.
rt.RestTesterServerContext.dbConfigs[dbName].DatabaseConfig.DbConfig.AllowConflicts = base.Ptr(true)
rt.RestTesterServerContext._dbConfigs[dbName].DatabaseConfig.DbConfig.AllowConflicts = base.Ptr(true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above about potential race and just use /db/_config API

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, as above - this test is ensuring coverage of loading an older pre 4.0 config, where AllowConflicts was allowed to be set. I cannot set this value through the REST API anymore (intentionally so)

I have created a named function to do this to make it more obvious why it's being done this way.

rest/api_test.go Outdated

// Attempting to disable `enable_star_channel`
rt.ServerContext().dbConfigs[dbName].DatabaseConfig.CacheConfig.ChannelCacheConfig.EnableStarChannel = base.Ptr(false)
rt.ServerContext()._dbConfigs[dbName].DatabaseConfig.CacheConfig.ChannelCacheConfig.EnableStarChannel = base.Ptr(false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, use REST API

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

}
}
sc.lock.RUnlock()
sc._databasesLock.RUnlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to file a separate ticket for this, if there was a panic between this line and and the readLock, then I want this to unlock.

If the panic happens in config polling, I believe that this will take down Sync Gateway. However, this can get called from any db scoped REST API handler:

GetInactiveDatabase (if serverless) -> FetchAndLoadDatabaseSince -> fetchAndLoadConfigs

Since this is only serverless, this is low priority.

For the issue that spawned this ticket (see https://jira.issues.couchbase.com/browse/CBG-5010 for the full logs), something locked the serverContext.lock mutex but didn't unlock it. I don't think it's this line but this is one of the only cases where this is called outside a defer.

I think it would be fine to do this as part of a separate defensive ticket.

Copy link
Member Author

@bbrks bbrks Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a ticket to review usage of the lock. This ticket is simply splitting out the http server lock.

I tested this by running /_ping and /db/_config in a loop to ensure that the db config update does not interfere with the ping response (which it now does not)

Will review with the other larger sc._databasesLock changes in CBG-5012

sc.lock.RUnlock()
sc._databasesLock.RLock()
dbc, ok := sc._databases[dbName]
sc._databasesLock.RUnlock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is so unlikely that this would panic, but due to excessive paranoia I am almost tempted to wrap this in a function in case sc._databases was nil when this function was called in a case where panics are handled.

Feel free to disregard.

Copy link
Member Author

@bbrks bbrks Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review with the other larger sc._databasesLock changes in CBG-5012

}

return sc.dbConfigs, nil
return sc._dbConfigs, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is weird because it returns a value that would require you to hold a lock. This behavior should be noted in the docstring and considered.

I didn't look through the implications of this.

Copy link
Member Author

@bbrks bbrks Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree this is weird and seems wrong - will review with the other larger sc._databasesLock changes in CBG-5012

return sc.Serve(ctx, config, publicServer, config.API.PublicInterface, CreatePublicHandler(sc))
}

func sharedBucketDatabaseCheck(ctx context.Context, sc *ServerContext) (errors error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we lock this function or rename to _sharedBucketDatabaseCheck and note which lock you are required to hold?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review with the other larger sc._databasesLock changes in CBG-5012

}

// Returns a list of buckets that are being shared by multiple databases.
func sharedBuckets(dbContextMap map[string][]*db.DatabaseContext) (sharedBuckets []sharedBucket) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we lock this function or rename to _sharedBuckets and note which lock you are required to hold?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review with the other larger sc._databasesLock changes in CBG-5012

dbc := sc.databases_[name]
sc.lock.RUnlock()
sc._databasesLock.RLock()
dbc := sc._databases[name]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could panic if sc._databases is nil, which I don't think would happen but might be worth protecting against?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review with the other larger sc._databasesLock changes in CBG-5012

dbc := sc.databases_[name]
sc._databasesLock.RLock()
defer sc._databasesLock.RUnlock()
dbc := sc._databases[name]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope, do we want to unlock inside this loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will review with the other larger sc._databasesLock changes in CBG-5012

…ig and loading it to simulate upgrade/old valid but now invalid configs
@bbrks bbrks merged commit 5bb2ae6 into main Nov 26, 2025
42 checks passed
@bbrks bbrks deleted the CBG-5006 branch November 26, 2025 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants