-
Notifications
You must be signed in to change notification settings - Fork 15
TNTP-2109: Switch to "safe" mode on vshard rebalance #462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
38f6747 to
90e1fa2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if, instead of threading bucket_id through all the call chains just to support safe_mode, we add:
bucket_ref(tuple.bucket_id)
bucket_unref(tuple.bucket_id)directly in all write methods?
- In “fast” mode
bucket_ref/bucket_unrefwould be no-ops. - In “slow/safe” mode they would delegate to
vshard.storage.bucket_ref/vshard.storage.bucket_unref.
This way:
- the
call_on_storagesignature doesn’t have to change (no need to passbucket_idaround); ref/unrefcalls live exactly where they logically belong (next to writes), instead of being scattered across different places only forsafe_mode.
To me this looks more readable and less error-prone than propagating bucket_id solely for safe mode handling.
Also we need to add document about rebalance in README or another document in doc dir
| self.next_index, self.next_replicaset = next(self.replicasets, self.next_index) | ||
|
|
||
| return self.func_args, replicaset, replicaset_id | ||
| return self.func_args, replicaset, replicaset_id, {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please write 4th return name in doc for method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| self.next_index, self.next_batch = next(self.batches_by_replicasets, self.next_index) | ||
|
|
||
| return func_args, replicaset, replicaset_id | ||
| return func_args, replicaset, replicaset_id, bucket_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please write 4th return name in doc for method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
| self.next_index, self.next_batch = next(self.batches_by_replicasets, self.next_index) | ||
|
|
||
| return func_args, replicaset, replicaset_id | ||
| return func_args, replicaset, replicaset_id, bucket_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please write 4th return name in doc for method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
crud/common/rebalance.lua
Outdated
| local SETTINGS_SPACE_NAME = '_crud_settings' | ||
|
|
||
|
|
||
| local M = rawget(_G, MODULE_INTERNALS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no rawset for MODULE_INTERNALS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be just
local M = {
safe_mode = false,
safe_mode_enable_hooks = {},
safe_mode_disable_hooks = {},
_router_cache_last_clear_ts = fiber.time()
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
crud/common/rebalance.lua
Outdated
| end | ||
| if (op == 'INSERT' and new.status == vshard_consts.BUCKET.RECEIVING) or | ||
| (op == 'REPLACE' and new.status == vshard_consts.BUCKET.SENDING) then | ||
| box.broadcast('_crud.safe_mode_enable', true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be add constant for _crud.safe_mode_enable too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a constant
crud/common/rebalance.lua
Outdated
| local stored_safe_mode | ||
| if box.space[SETTINGS_SPACE_NAME] == nil then | ||
| create_space() | ||
| box.space[SETTINGS_SPACE_NAME]:insert{ 'safe_mode', false } | ||
| else | ||
| stored_safe_mode = box.space[SETTINGS_SPACE_NAME]:get{ 'safe_mode' } | ||
| end | ||
| M.safe_mode = stored_safe_mode and stored_safe_mode.value or false | ||
|
|
||
| if M.safe_mode then | ||
| for hook, _ in pairs(M.safe_mode_enable_hooks) do | ||
| hook() | ||
| end | ||
| else | ||
| box.space._bucket:on_replace(safe_mode_trigger) | ||
| for hook, _ in pairs(M.safe_mode_disable_hooks) do | ||
| hook() | ||
| end | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be move all this logic to box.watch(box.status)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're probably right. Anyway the node will not get any requests before box.status event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
crud/common/call.lua
Outdated
| err.name == 'BUCKET_IS_LOCKED' or | ||
| err.name == 'TRANSFER_IS_IN_PROGRESS' then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried to beautify it)
| vshard_router:_bucket_reset(err.bucket_id) | ||
|
|
||
| -- Substitute replicaset only for single bucket_id calls. | ||
| if err.destination and vshard_router.replicasets[err.destination] and #bucket_ids == 1 then | ||
| replicaset = vshard_router.replicasets[err.destination] | ||
| else | ||
| return nil, err | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but very very simplified. Mostly because bucket_set() can't be accessed from outside vshard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment about vshard above
| local record_by_replicaset = batches[replicaset_id] or { | ||
| replicaset = replicaset, | ||
| tuples = {}, | ||
| bucket_ids = {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on write requests crud calculates bucket_Ids , so it can be retrieved from data
crud/common/call.lua
Outdated
| end | ||
|
|
||
| local function call_on_storage_safe(run_as_user, bucket_ids, mode, func_name, ...) | ||
| fiber.name(CRUD_CALL_FIBER_NAME .. func_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we mark safe functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is definitely not necessary but may be useful for debugging
|
also switches between safe/fast mode must be logged |
22a0fcc to
5f3f46f
Compare
What has been done? Why? What problem is being solved?
I didn't forget about
Closes #???