Skip to content

Conversation

@ita-sammann
Copy link

What has been done? Why? What problem is being solved?

I didn't forget about

  • Tests
  • Changelog
  • Documentation

Closes #???

@ita-sammann ita-sammann requested a review from Satbek November 19, 2025 08:19
@ita-sammann ita-sammann force-pushed the TNTP-2109-safe-mode-switch branch from 38f6747 to 90e1fa2 Compare November 19, 2025 10:37
Copy link
Contributor

@Satbek Satbek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if, instead of threading bucket_id through all the call chains just to support safe_mode, we add:

bucket_ref(tuple.bucket_id)
bucket_unref(tuple.bucket_id)

directly in all write methods?

  • In “fast” mode bucket_ref / bucket_unref would be no-ops.
  • In “slow/safe” mode they would delegate to vshard.storage.bucket_ref / vshard.storage.bucket_unref.

This way:

  • the call_on_storage signature doesn’t have to change (no need to pass bucket_id around);
  • ref/unref calls live exactly where they logically belong (next to writes), instead of being scattered across different places only for safe_mode.

To me this looks more readable and less error-prone than propagating bucket_id solely for safe mode handling.

Also we need to add document about rebalance in README or another document in doc dir

self.next_index, self.next_replicaset = next(self.replicasets, self.next_index)

return self.func_args, replicaset, replicaset_id
return self.func_args, replicaset, replicaset_id, {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please write 4th return name in doc for method

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

self.next_index, self.next_batch = next(self.batches_by_replicasets, self.next_index)

return func_args, replicaset, replicaset_id
return func_args, replicaset, replicaset_id, bucket_ids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please write 4th return name in doc for method

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

self.next_index, self.next_batch = next(self.batches_by_replicasets, self.next_index)

return func_args, replicaset, replicaset_id
return func_args, replicaset, replicaset_id, bucket_ids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please write 4th return name in doc for method

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

local SETTINGS_SPACE_NAME = '_crud_settings'


local M = rawget(_G, MODULE_INTERNALS)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no rawset for MODULE_INTERNALS

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be just

local M = {
    safe_mode = false,
    safe_mode_enable_hooks = {},
    safe_mode_disable_hooks = {},
    _router_cache_last_clear_ts = fiber.time()
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

end
if (op == 'INSERT' and new.status == vshard_consts.BUCKET.RECEIVING) or
(op == 'REPLACE' and new.status == vshard_consts.BUCKET.SENDING) then
box.broadcast('_crud.safe_mode_enable', true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be add constant for _crud.safe_mode_enable too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a constant

Comment on lines 119 to 137
local stored_safe_mode
if box.space[SETTINGS_SPACE_NAME] == nil then
create_space()
box.space[SETTINGS_SPACE_NAME]:insert{ 'safe_mode', false }
else
stored_safe_mode = box.space[SETTINGS_SPACE_NAME]:get{ 'safe_mode' }
end
M.safe_mode = stored_safe_mode and stored_safe_mode.value or false

if M.safe_mode then
for hook, _ in pairs(M.safe_mode_enable_hooks) do
hook()
end
else
box.space._bucket:on_replace(safe_mode_trigger)
for hook, _ in pairs(M.safe_mode_disable_hooks) do
hook()
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be move all this logic to box.watch(box.status)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're probably right. Anyway the node will not get any requests before box.status event.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 166 to 167
err.name == 'BUCKET_IS_LOCKED' or
err.name == 'TRANSFER_IS_IN_PROGRESS' then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentations

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried to beautify it)

Comment on lines +168 to +175
vshard_router:_bucket_reset(err.bucket_id)

-- Substitute replicaset only for single bucket_id calls.
if err.destination and vshard_router.replicasets[err.destination] and #bucket_ids == 1 then
replicaset = vshard_router.replicasets[err.destination]
else
return nil, err
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but very very simplified. Mostly because bucket_set() can't be accessed from outside vshard.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment about vshard above

local record_by_replicaset = batches[replicaset_id] or {
replicaset = replicaset,
tuples = {},
bucket_ids = {},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on write requests crud calculates bucket_Ids , so it can be retrieved from data

end

local function call_on_storage_safe(run_as_user, bucket_ids, mode, func_name, ...)
fiber.name(CRUD_CALL_FIBER_NAME .. func_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we mark safe functions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is definitely not necessary but may be useful for debugging

@Satbek
Copy link
Contributor

Satbek commented Nov 20, 2025

also switches between safe/fast mode must be logged

@ita-sammann ita-sammann force-pushed the TNTP-2109-safe-mode-switch branch from 22a0fcc to 5f3f46f Compare November 20, 2025 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants