-
Notifications
You must be signed in to change notification settings - Fork 2
fix(sdk): correct FAILURE_TOLERANCE_EXCEEDED behavior in concurrent operations #359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…perations
Fix critical bug where tolerance checks occurred after ALL_COMPLETED check,
causing incorrect completion reasons when failure thresholds were exceeded.
Changes:
- Fix logic ordering in getCompletionReason to check tolerance before completion
- Optimize performance by passing failureCount parameter to eliminate redundant calculations
- Add failure-threshold-exceeded examples for map and parallel operations
- Create comprehensive tests validating FAILURE_TOLERANCE_EXCEEDED behavior
- Add context.wait({seconds:1}) after all map/parallel operations in examples
The fix ensures operations correctly return FAILURE_TOLERANCE_EXCEEDED when
thresholds are exceeded, even if all items have completed execution.
e04787b to
ed7e29d
Compare
…rcentage test Add debug logging to capture execution status, history events, and result payload to help diagnose integration test failure in cloud environment.
- Add maxAttempts: 1 to failing steps in parallel/map tolerated failure examples - Prevents infinite retries that cause execution timeouts in cloud environment - Allows proper completion evaluation when failure tolerance is met
- Use retryPresets.noRetry instead of invalid retry config object - Import retryPresets from SDK to disable step retries properly - Resolves TypeScript compilation errors
- Change '(count)' to 'count' and '(percentage)' to 'percentage' - Fixes AWS Lambda function name validation error - Parentheses are not allowed in Lambda function names
- Add retryPresets.noRetry to long-running step (20s sleep) - Prevents test timeout when step fails and retries with default strategy - Test was timing out at 30s due to retry delays on top of 20s sleep
- Change from 20s to 15s sleep to prevent cloud timeout - Test was still timing out at 30s due to cloud environment overhead - 15s + overhead should complete within 30s timeout
- Add retryPresets.noRetry to all 3 invoke operations in branch 2 - Prevents timeout when invokes fail and retry with default strategy - Keep 20s sleep in branch 1 since invoke retries were the real issue
…pointing-invoke - Increase test timeout from 30s to 60s to accommodate cloud environment overhead - Update duration expectation from 15s to 25s to match 20s sleep + invoke time - Local test now passes in 23s, should work in cloud with 60s timeout
- Added 4 comprehensive test cases for getCompletionReason function in replayItems method - Tests cover toleratedFailureCount, toleratedFailurePercentage, and fail-fast scenarios - Ensures complete coverage of FAILURE_TOLERANCE_EXCEEDED logic for both execution modes
anthonyting
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have some more assertions in the integ tests on what is returned from the result? Currently it just asserts the counts, but it's possible that the results aren't returned correctly.
...e-execution-sdk-js/src/handlers/concurrent-execution-handler/concurrent-execution-handler.ts
Show resolved
Hide resolved
…thod - Eliminated code duplication by creating a single private getCompletionReason method - Both replayItems and executeItemsConcurrently now use the same logic - Reduced code size by ~80 lines while maintaining identical functionality - All tests pass with improved coverage
…sertions - Added detailed assertions for each branch result in parallel tolerated failure tests - Added detailed assertions for each item result in map tolerated failure tests - Verify specific success/failure patterns match expected behavior - Ensure proper status, result values, and error handling for each iteration - All tests pass with comprehensive validation of individual results
- Added detailed individual branch/item assertions for tolerated failure tests - Enhanced min-successful tests with comprehensive validation of started vs completed items - Verified all items are tracked (including STARTED state for incomplete items) - All tests now pass with thorough validation of individual results - Failure threshold tests kept simple due to complex error payload structure
Added more assertions to tests |
…rtions - Added comprehensive individual result validation for map failure threshold exceeded count test - Added comprehensive individual result validation for map failure threshold exceeded percentage test - Verified proper status (SUCCEEDED/FAILED), index mapping, error types, and result values - Fixed TypeScript type assertions for accessing nested result payload structure - Tests now validate both overall completion reason and detailed individual item outcomes
…ld exceeded tests - Replaced incorrect 'retry' property with proper 'retryStrategy' function - Limited retry attempts to 2 to prevent test timeouts (was previously using default 5 attempts) - Added 1-second delay between retry attempts for consistent timing - Tests now complete within 60-second timeout while maintaining correct behavior validation - Both map failure threshold exceeded count and percentage tests now pass reliably
| const historyEvents = execution.getHistoryEvents(); | ||
| const mapContext = historyEvents.find( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better if we used the testing library getOperation().getContextDetails() instead of parsing the history results (same for other tests)
…reshold exceeded tests - Reduced retry attempts from 2 to 1 (total attempts from 3 to 2) for faster test execution - Tests still demonstrate failure threshold exceeded behavior but run significantly faster - Note: Tests currently fail because map operation stops early when threshold exceeded, preventing successful items from completing - this is expected behavior that needs test expectation adjustment
- Replace custom retry strategies with retryPresets.noRetry in map failure threshold exceeded tests - Eliminates unnecessary retry delays that were causing cloud integration test timeouts - Maintains test behavior while significantly reducing execution time
- Replace retryPresets.noRetry with createRetryStrategy({ maxAttempts: 1 })
- Ensures proper retry strategy configuration for immediate failure
- Maintains fast execution while using the correct retry helper function
- Add wait operation after map in count test to match percentage test structure - Increase test timeout to 15 seconds to accommodate retry delays with skipTime: false - Both tests now pass successfully with proper FAILURE_TOLERANCE_EXCEEDED behavior
Fix bug where tolerance checks occurred after ALL_COMPLETED check,
causing incorrect completion reasons when failure thresholds were exceeded.