Skip to content

Conversation

@yaythomas
Copy link
Member

@yaythomas yaythomas commented Dec 5, 2025

Description of changes:

Implement double-check pattern across all operation types to handle synchronous checkpoint responses, preventing invalid state transitions and unnecessary suspensions.

Bug fix: Callback operations now defer errors to Callback.result() instead of raising immediately in create_callback(), ensuring deterministic replay when code executes between callback creation and result retrieval.

Changes:

  • Add OperationExecutor base class with CheckResult for status checking
  • Implement double-check pattern: check status before and after checkpoint
  • Use is_sync parameter to control checkpoint synchronization behavior
  • Refactor all operations to use executor pattern:
    • StepOperationExecutor: sync for AT_MOST_ONCE, async for AT_LEAST_ONCE
    • InvokeOperationExecutor: sync checkpoint, always suspends
    • WaitOperationExecutor: sync checkpoint, suspends if not complete
    • CallbackOperationExecutor: sync checkpoint, defers errors to result()
    • WaitForConditionOperationExecutor: async checkpoint, no second check
    • ChildOperationExecutor: async checkpoint, handles large payloads
  • Update all tests to expect double checkpoint checks with side_effect mocks

Affected modules:

  • operation/base.py: New OperationExecutor and CheckResult classes
  • operation/step.py: StepOperationExecutor implementation
  • operation/invoke.py: InvokeOperationExecutor implementation
  • operation/wait.py: WaitOperationExecutor implementation
  • operation/callback.py: CallbackOperationExecutor with deferred errors
  • operation/wait_for_condition.py: WaitForConditionOperationExecutor
  • operation/child.py: ChildOperationExecutor with ReplayChildren support
  • All operation tests: Updated mocks for double-check pattern

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@yaythomas yaythomas requested a review from wangyb-A as a code owner December 5, 2025 09:45
@yaythomas yaythomas changed the title feat: implement checkpoint response handling for all operations feat: add checkpoint response handling Dec 5, 2025
@yaythomas yaythomas force-pushed the inline-result-checks branch 5 times, most recently from 98e2bca to 3928672 Compare December 8, 2025 08:10
Implement double-check pattern across all operation types to handle
synchronous checkpoint responses, preventing invalid state transitions
and unnecessary suspensions.

Bug fix: Callback operations now defer errors to Callback.result()
instead of raising immediately in create_callback(), ensuring deterministic
replay when code executes between callback creation and result retrieval.

Changes:
- Add OperationExecutor base class with CheckResult for status checking
- Implement double-check pattern: check status before and after checkpoint
- Use is_sync parameter to control checkpoint synchronization behavior
- Refactor all operations to use executor pattern:
  * StepOperationExecutor: sync for AT_MOST_ONCE, async for AT_LEAST_ONCE
  * InvokeOperationExecutor: sync checkpoint, always suspends
  * WaitOperationExecutor: sync checkpoint, suspends if not complete
  * CallbackOperationExecutor: sync checkpoint, defers errors to result()
  * WaitForConditionOperationExecutor: async checkpoint, no second check
  * ChildOperationExecutor: async checkpoint, handles large payloads
- Remove inline while loops, centralize logic in base class
- Update all tests to expect double checkpoint checks with side_effect mocks

Affected modules:
- operation/base.py: New OperationExecutor and CheckResult classes
- operation/step.py: StepOperationExecutor implementation
- operation/invoke.py: InvokeOperationExecutor implementation
- operation/wait.py: WaitOperationExecutor implementation
- operation/callback.py: CallbackOperationExecutor with deferred errors
- operation/wait_for_condition.py: WaitForConditionOperationExecutor
- operation/child.py: ChildOperationExecutor with ReplayChildren support
- All operation tests: Updated mocks for double-check pattern
@yaythomas yaythomas force-pushed the inline-result-checks branch from 3928672 to 8dd58c8 Compare December 8, 2025 22:35
Copy link
Contributor

@wangyb-A wangyb-A left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved.
nit: The expression of using state.create_checkpoint in sync mode is different among different operations. Should we use create_checkpoint_sync to keep it consistent and easier to read?

e.g. we are using
self.state.create_checkpoint(operation_update=create_callback_operation) for callback. And using self.state.create_checkpoint(operation_update=start_operation, is_sync=True) for invoke.
They have same semantic though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants