Skip to content

Conversation

@LennartKleymann
Copy link
Contributor

closes #813

Feature: Configurable Exponential Backoff for Job Worker

This PR adds a configurable exponential backoff strategy for job polling retries in the C# client. This new feature helps prevent overwhelming the gateway and aligns the C# client with the behavior of the Java client.

Motivation

When the gateway is under heavy load and returns a RESOURCE_EXHAUSTED gRPC error, the current worker's aggressive retry behavior can create a "thundering herd" problem. By adding an exponential backoff, the worker can space out its retries, giving the gateway time to recover and improving overall system resilience.

Changes & Usage

  • Adds IBackoffSupplier and IExponentialBackoffBuilder: These new APIs provide a fluent builder to configure the backoff policy.
  • New Worker Builder API: Use the new BackoffSupplier() method on the job worker builder to supply a custom backoff strategy.
    • Example:
      var backoffSupplier = new ExponentialBackoffBuilder()
          .MinDelay(TimeSpan.FromMilliseconds(50))
          .MaxDelay(TimeSpan.FromSeconds(5))
          .BackoffFactor(1.6)
          .JitterFactor(0.1)
          .Build();
      
      client.NewWorker()
            .JobType("payment")
            .Handler(handler)
            .BackoffSupplier(backoffSupplier)
            .Open();
  • Backoff Logic: The backoff is applied only on RESOURCE_EXHAUSTED errors and is reset to the initial polling interval after a successful job activation.

Backwards Compatibility

Existing worker configurations will continue to function without errors. However, the default polling behavior for these workers is now an exponential backoff with standard values, which differs from the previous fixed retry mechanism. The API remains backward compatible.

Testing

Unit tests have been added to verify the backoff behavior, including the handling of jitter and the monotonic increase of the delay.

@LennartKleymann LennartKleymann changed the title Added backoff for job worker feat: added backoff for job worker Sep 23, 2025
@nloding nloding self-assigned this Sep 26, 2025
@nloding nloding added feature waiting-for-camunda Waiting for a review or feedback from a Camunda team member labels Sep 26, 2025
@ChrisKujawa ChrisKujawa requested a review from nloding September 29, 2025 14:08
Copy link
Collaborator

@nloding nloding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloned and tested locally, everything works as advertised! great work!

Copy link
Collaborator

@ChrisKujawa ChrisKujawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @LennartKleymann, I like it 🙇🏼 Can you please resolve the merge conflicts and also apply the backoff supplier to the job streaming? I think then we can also remove the constants we had before (added here #812) wdyt?

After that we should be fine to merge

/// </summary>
/// <param name="backoffSupplier">The supplier used to compute the next retry delay in ms.</param>
/// <returns>The builder for this worker.</returns>
IJobWorkerBuilderStep3 BackoffSupplier(IBackoffSupplier backoffSupplier);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

this.random = random ?? new Random();
}

public long SupplyRetryDelay(long currentRetryDelay)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 Could make a note that this implementation is based on the Java client version

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added summary and remarks to the ExponentialBackoffSupplier.cs

catch (Exception ex)
{
logger?.LogWarning(ex, "Backoff supplier failed; falling back to default backoff.");
var defaultSupplier = new ExponentialBackoffBuilderImpl().Build();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 we could build this once as constant to always have it available wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved initialization to the constructor to have one default supplier per JobWorker.
If you think of a static instance somewhere can you give me a hint where you would see it?

{
LogRpcException(rpcException);
await Task.Delay(pollInterval, cancellationToken);
await Backoff(cancellationToken);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed in my commit to backoff now every RpcException in Stream and Poll Jobs

@LennartKleymann
Copy link
Contributor Author

@ChrisKujawa can you review again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature waiting-for-camunda Waiting for a review or feedback from a Camunda team member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

I can configure an exponential backoff supplier for the Job Worker

3 participants