-
Notifications
You must be signed in to change notification settings - Fork 140
CBG-5006: Split ServerContext mutexes #7889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the ServerContext struct to use separate mutexes for different functional areas instead of a single shared mutex. The changes split the previous lock into _databasesLock (for database-related maps) and _httpServersLock (for HTTP server management), reducing lock contention and preventing potential read/write deadlocks.
Key changes:
- Replaced single
lockwith_databasesLockand_httpServersLock - Renamed all database-related maps to use underscore prefix (
_databases,_dbConfigs,_dbRegistry,_collectionRegistry) - Updated all lock acquisitions throughout the codebase to use the appropriate specialized lock
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| rest/server_context.go | Core changes to ServerContext struct definition, splitting the single lock into two specialized locks and renaming internal maps with underscore prefixes |
| rest/utilities_testing.go | Updated test helper functions to use _databasesLock instead of generic lock |
| rest/serverless_test.go | Updated test assertions to reference renamed database maps with underscore prefixes |
| rest/handler_config_database.go | Updated database configuration mutation handler to use _databasesLock |
| rest/config_test.go | Updated test assertions to use renamed _dbConfigs map |
| rest/config.go | Updated all database configuration operations to use _databasesLock and renamed map references |
| rest/api_test.go | Updated direct test access to renamed _dbConfigs map |
| rest/admin_api.go | Updated admin API handlers to use _databasesLock and renamed database maps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
torcolvin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote a lot of comments about paranoia about grabbing the lock but possibly panicking or not letting go of the lock because I believe the issue that spawned this ticket had a ServerContext.lock.Lock() that was not unlocked but no longer existed in the stack trace that we had.
I think that fixing these issues can be deferred to a separate ticket and some cases were really extra paranoid and probably don't need to be addressed.
| @@ -2990,7 +2989,7 @@ | |||
| resp = rt.CreateDatabase(dbName, dbConfig) | |||
| RequireStatus(t, resp, http.StatusCreated) | |||
|
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line should be a lock but actually i'd just change the test so it makes a different REST api call
I don't want to accidentally have the race detector pick up this error because we are doing something naughty in the tests and config polling runs simultaneously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case isn't quite a regular /db/_config operation - it's effectively doing a persistent config load of an old configuration - bypassing the upfront validation that /db/_config would've run.
Either way, I can move it to a function to wrap in a read mutex (the read lock is required to read the dbName entry, but nothing more to update the underlying configuration)
rest/api_test.go
Outdated
|
|
||
| // Attempt to set the `AllowConflicts` property to true in the database configuration. | ||
| rt.RestTesterServerContext.dbConfigs[dbName].DatabaseConfig.DbConfig.AllowConflicts = base.Ptr(true) | ||
| rt.RestTesterServerContext._dbConfigs[dbName].DatabaseConfig.DbConfig.AllowConflicts = base.Ptr(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above about potential race and just use /db/_config API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, as above - this test is ensuring coverage of loading an older pre 4.0 config, where AllowConflicts was allowed to be set. I cannot set this value through the REST API anymore (intentionally so)
I have created a named function to do this to make it more obvious why it's being done this way.
rest/api_test.go
Outdated
|
|
||
| // Attempting to disable `enable_star_channel` | ||
| rt.ServerContext().dbConfigs[dbName].DatabaseConfig.CacheConfig.ChannelCacheConfig.EnableStarChannel = base.Ptr(false) | ||
| rt.ServerContext()._dbConfigs[dbName].DatabaseConfig.CacheConfig.ChannelCacheConfig.EnableStarChannel = base.Ptr(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, use REST API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
| } | ||
| } | ||
| sc.lock.RUnlock() | ||
| sc._databasesLock.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to file a separate ticket for this, if there was a panic between this line and and the readLock, then I want this to unlock.
If the panic happens in config polling, I believe that this will take down Sync Gateway. However, this can get called from any db scoped REST API handler:
GetInactiveDatabase (if serverless) -> FetchAndLoadDatabaseSince -> fetchAndLoadConfigs
Since this is only serverless, this is low priority.
For the issue that spawned this ticket (see https://jira.issues.couchbase.com/browse/CBG-5010 for the full logs), something locked the serverContext.lock mutex but didn't unlock it. I don't think it's this line but this is one of the only cases where this is called outside a defer.
I think it would be fine to do this as part of a separate defensive ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already a ticket to review usage of the lock. This ticket is simply splitting out the http server lock.
I tested this by running /_ping and /db/_config in a loop to ensure that the db config update does not interfere with the ping response (which it now does not)
Will review with the other larger sc._databasesLock changes in CBG-5012
| sc.lock.RUnlock() | ||
| sc._databasesLock.RLock() | ||
| dbc, ok := sc._databases[dbName] | ||
| sc._databasesLock.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is so unlikely that this would panic, but due to excessive paranoia I am almost tempted to wrap this in a function in case sc._databases was nil when this function was called in a case where panics are handled.
Feel free to disregard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will review with the other larger sc._databasesLock changes in CBG-5012
| } | ||
|
|
||
| return sc.dbConfigs, nil | ||
| return sc._dbConfigs, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is weird because it returns a value that would require you to hold a lock. This behavior should be noted in the docstring and considered.
I didn't look through the implications of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree this is weird and seems wrong - will review with the other larger sc._databasesLock changes in CBG-5012
| return sc.Serve(ctx, config, publicServer, config.API.PublicInterface, CreatePublicHandler(sc)) | ||
| } | ||
|
|
||
| func sharedBucketDatabaseCheck(ctx context.Context, sc *ServerContext) (errors error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we lock this function or rename to _sharedBucketDatabaseCheck and note which lock you are required to hold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will review with the other larger sc._databasesLock changes in CBG-5012
| } | ||
|
|
||
| // Returns a list of buckets that are being shared by multiple databases. | ||
| func sharedBuckets(dbContextMap map[string][]*db.DatabaseContext) (sharedBuckets []sharedBucket) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we lock this function or rename to _sharedBuckets and note which lock you are required to hold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will review with the other larger sc._databasesLock changes in CBG-5012
| dbc := sc.databases_[name] | ||
| sc.lock.RUnlock() | ||
| sc._databasesLock.RLock() | ||
| dbc := sc._databases[name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could panic if sc._databases is nil, which I don't think would happen but might be worth protecting against?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will review with the other larger sc._databasesLock changes in CBG-5012
| dbc := sc.databases_[name] | ||
| sc._databasesLock.RLock() | ||
| defer sc._databasesLock.RUnlock() | ||
| dbc := sc._databases[name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope, do we want to unlock inside this loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will review with the other larger sc._databasesLock changes in CBG-5012
…ig and loading it to simulate upgrade/old valid but now invalid configs
CBG-5006
Have locks for each part of
ServerContextinstead of a shared mutex - avoids R/W deadlocks between HTTP Server list and Databases and reduces scope of each lock.Integration Tests
GSI=true,xattrs=truehttps://jenkins.sgwdev.com/job/SyncGatewayIntegration/170/