fix(sdk): correct FAILURE_TOLERANCE_EXCEEDED behavior in concurrent operations #359

ParidelPooya · 2025-12-03T17:39:45Z

Fix bug where tolerance checks occurred after ALL_COMPLETED check,
causing incorrect completion reasons when failure thresholds were exceeded.

Changes:
- Fix logic ordering in getCompletionReason to check tolerance before completion
- Add failure-threshold-exceeded examples for map and parallel operations
- Create tests validating FAILURE_TOLERANCE_EXCEEDED behavior

…perations Fix critical bug where tolerance checks occurred after ALL_COMPLETED check, causing incorrect completion reasons when failure thresholds were exceeded. Changes: - Fix logic ordering in getCompletionReason to check tolerance before completion - Optimize performance by passing failureCount parameter to eliminate redundant calculations - Add failure-threshold-exceeded examples for map and parallel operations - Create comprehensive tests validating FAILURE_TOLERANCE_EXCEEDED behavior - Add context.wait({seconds:1}) after all map/parallel operations in examples The fix ensures operations correctly return FAILURE_TOLERANCE_EXCEEDED when thresholds are exceeded, even if all items have completed execution.

…rcentage test Add debug logging to capture execution status, history events, and result payload to help diagnose integration test failure in cloud environment.

- Add maxAttempts: 1 to failing steps in parallel/map tolerated failure examples - Prevents infinite retries that cause execution timeouts in cloud environment - Allows proper completion evaluation when failure tolerance is met

- Use retryPresets.noRetry instead of invalid retry config object - Import retryPresets from SDK to disable step retries properly - Resolves TypeScript compilation errors

- Change '(count)' to 'count' and '(percentage)' to 'percentage' - Fixes AWS Lambda function name validation error - Parentheses are not allowed in Lambda function names

- Add retryPresets.noRetry to long-running step (20s sleep) - Prevents test timeout when step fails and retries with default strategy - Test was timing out at 30s due to retry delays on top of 20s sleep

- Change from 20s to 15s sleep to prevent cloud timeout - Test was still timing out at 30s due to cloud environment overhead - 15s + overhead should complete within 30s timeout

- Add retryPresets.noRetry to all 3 invoke operations in branch 2 - Prevents timeout when invokes fail and retry with default strategy - Keep 20s sleep in branch 1 since invoke retries were the real issue

…pointing-invoke - Increase test timeout from 30s to 60s to accommodate cloud environment overhead - Update duration expectation from 15s to 25s to match 20s sleep + invoke time - Local test now passes in 23s, should work in cloud with 60s timeout

- Added 4 comprehensive test cases for getCompletionReason function in replayItems method - Tests cover toleratedFailureCount, toleratedFailurePercentage, and fail-fast scenarios - Ensures complete coverage of FAILURE_TOLERANCE_EXCEEDED logic for both execution modes

anthonyting

Can we have some more assertions in the integ tests on what is returned from the result? Currently it just asserts the counts, but it's possible that the results aren't returned correctly.

...e-execution-sdk-js/src/handlers/concurrent-execution-handler/concurrent-execution-handler.ts

…thod - Eliminated code duplication by creating a single private getCompletionReason method - Both replayItems and executeItemsConcurrently now use the same logic - Reduced code size by ~80 lines while maintaining identical functionality - All tests pass with improved coverage

…sertions - Added detailed assertions for each branch result in parallel tolerated failure tests - Added detailed assertions for each item result in map tolerated failure tests - Verify specific success/failure patterns match expected behavior - Ensure proper status, result values, and error handling for each iteration - All tests pass with comprehensive validation of individual results

…gic)

- Added detailed individual branch/item assertions for tolerated failure tests - Enhanced min-successful tests with comprehensive validation of started vs completed items - Verified all items are tracked (including STARTED state for incomplete items) - All tests now pass with thorough validation of individual results - Failure threshold tests kept simple due to complex error payload structure

ParidelPooya · 2025-12-03T23:13:45Z

Can we have some more assertions in the integ tests on what is returned from the result? Currently it just asserts the counts, but it's possible that the results aren't returned correctly.

Added more assertions to tests

…rtions - Added comprehensive individual result validation for map failure threshold exceeded count test - Added comprehensive individual result validation for map failure threshold exceeded percentage test - Verified proper status (SUCCEEDED/FAILED), index mapping, error types, and result values - Fixed TypeScript type assertions for accessing nested result payload structure - Tests now validate both overall completion reason and detailed individual item outcomes

…ld exceeded tests - Replaced incorrect 'retry' property with proper 'retryStrategy' function - Limited retry attempts to 2 to prevent test timeouts (was previously using default 5 attempts) - Added 1-second delay between retry attempts for consistent timing - Tests now complete within 60-second timeout while maintaining correct behavior validation - Both map failure threshold exceeded count and percentage tests now pass reliably

anthonyting · 2025-12-03T23:39:36Z

...durable-execution-sdk-js-examples/src/examples/map/min-successful/map-min-successful.test.ts

+      const historyEvents = execution.getHistoryEvents();
+      const mapContext = historyEvents.find(


It's better if we used the testing library getOperation().getContextDetails() instead of parsing the history results (same for other tests)

…reshold exceeded tests - Reduced retry attempts from 2 to 1 (total attempts from 3 to 2) for faster test execution - Tests still demonstrate failure threshold exceeded behavior but run significantly faster - Note: Tests currently fail because map operation stops early when threshold exceeded, preventing successful items from completing - this is expected behavior that needs test expectation adjustment

- Replace custom retry strategies with retryPresets.noRetry in map failure threshold exceeded tests - Eliminates unnecessary retry delays that were causing cloud integration test timeouts - Maintains test behavior while significantly reducing execution time

- Replace retryPresets.noRetry with createRetryStrategy({ maxAttempts: 1 }) - Ensures proper retry strategy configuration for immediate failure - Maintains fast execution while using the correct retry helper function

- Add wait operation after map in count test to match percentage test structure - Increase test timeout to 15 seconds to accommodate retry delays with skipTime: false - Both tests now pass successfully with proper FAILURE_TOLERANCE_EXCEEDED behavior

ParidelPooya force-pushed the feat/completion-config-examples branch from e04787b to ed7e29d Compare December 3, 2025 19:16

ParidelPooya changed the title ~~feat(examples): add completion config examples for map and parallel~~ fix(sdk): correct FAILURE_TOLERANCE_EXCEEDED behavior in concurrent operations Dec 3, 2025

ParidelPooya added 9 commits December 3, 2025 11:35

debug: add execution history logging to parallel-tolerated-failure-pe…

fe6a8d1

…rcentage test Add debug logging to capture execution status, history events, and result payload to help diagnose integration test failure in cloud environment.

Fix tolerated failure tests by disabling step retries

e1cb0a3

- Add maxAttempts: 1 to failing steps in parallel/map tolerated failure examples - Prevents infinite retries that cause execution timeouts in cloud environment - Allows proper completion evaluation when failure tolerance is met

Fix retry strategy syntax in tolerated failure examples

edaa096

- Use retryPresets.noRetry instead of invalid retry config object - Import retryPresets from SDK to disable step retries properly - Resolves TypeScript compilation errors

Remove parentheses from failure threshold exceeded example names

0d88f63

- Change '(count)' to 'count' and '(percentage)' to 'percentage' - Fixes AWS Lambda function name validation error - Parentheses are not allowed in Lambda function names

Fix force-checkpointing-invoke test timeout by disabling retries

3c7d7e0

- Add retryPresets.noRetry to long-running step (20s sleep) - Prevents test timeout when step fails and retries with default strategy - Test was timing out at 30s due to retry delays on top of 20s sleep

Reduce sleep time in force-checkpointing-invoke test

657abd9

- Change from 20s to 15s sleep to prevent cloud timeout - Test was still timing out at 30s due to cloud environment overhead - 15s + overhead should complete within 30s timeout

Fix force-checkpointing-invoke timeout by disabling invoke retries

aa3036f

- Add retryPresets.noRetry to all 3 invoke operations in branch 2 - Prevents timeout when invokes fail and retry with default strategy - Keep 20s sleep in branch 1 since invoke retries were the real issue

anthonyting requested changes Dec 3, 2025

View reviewed changes

...e-execution-sdk-js/src/handlers/concurrent-execution-handler/concurrent-execution-handler.ts Show resolved Hide resolved

ParidelPooya added 4 commits December 3, 2025 14:52

WIP: Add enhanced test assertions (needs fixes for type errors and lo…

6ba34ff

…gic)

ParidelPooya added 2 commits December 3, 2025 15:15

anthonyting reviewed Dec 3, 2025

View reviewed changes

ParidelPooya added 4 commits December 3, 2025 15:40

Use createRetryStrategy for failure threshold exceeded tests

df40701

- Replace retryPresets.noRetry with createRetryStrategy({ maxAttempts: 1 }) - Ensures proper retry strategy configuration for immediate failure - Maintains fast execution while using the correct retry helper function

Fix failure threshold exceeded tests

132931f

- Add wait operation after map in count test to match percentage test structure - Increase test timeout to 15 seconds to accommodate retry delays with skipTime: false - Both tests now pass successfully with proper FAILURE_TOLERANCE_EXCEEDED behavior

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(sdk): correct FAILURE_TOLERANCE_EXCEEDED behavior in concurrent operations #359

fix(sdk): correct FAILURE_TOLERANCE_EXCEEDED behavior in concurrent operations #359

ParidelPooya commented Dec 3, 2025 •

edited

Loading

Uh oh!

anthonyting left a comment

Uh oh!

Uh oh!

ParidelPooya commented Dec 3, 2025

Uh oh!

anthonyting Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		const historyEvents = execution.getHistoryEvents();
		const mapContext = historyEvents.find(

fix(sdk): correct FAILURE_TOLERANCE_EXCEEDED behavior in concurrent operations #359

Are you sure you want to change the base?

fix(sdk): correct FAILURE_TOLERANCE_EXCEEDED behavior in concurrent operations #359

Conversation

ParidelPooya commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anthonyting left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ParidelPooya commented Dec 3, 2025

Uh oh!

anthonyting Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ParidelPooya commented Dec 3, 2025 •

edited

Loading