Skip to content

Conversation

@PlugaruT
Copy link
Contributor

What Does This Do

Enables Kafka producers to extract and continue existing traces from record headers.

Motivation

Fix for #5092

Currently, when a Kafka producer sends a record that already has trace context in its headers (e.g., when forwarding messages between services), the tracer creates a new root span instead of continuing the existing trace. This breaks distributed tracing continuity

This change extracts the trace context from record headers using extractContextAndGetSpanContext() and uses it as the parent for the produce span, maintaining trace continuity across service boundaries. It's also making dd-trace behave the same way opentelemetry does, see here.

Additional Notes

Contributor Checklist

Jira ticket: [PROJ-IDENT]

  Allow Kafka producers to continue existing traces by extracting trace context from record headers and using it as parent for the produce span. This enables distributed tracing when messages are forwarded between services with pre-existing context.
@PlugaruT PlugaruT marked this pull request as ready for review November 23, 2025 15:44
@PlugaruT PlugaruT requested a review from a team as a code owner November 23, 2025 15:44
Copy link
Contributor

@PerfectSlayer PerfectSlayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @PlugaruT 👋 Thanks for taking time to contribute.

Quick feedback about the change but I think this might break the context tracking later as KafkaProducerCallback won't use the extracted context.

What I would do is:

  • Declare an AgentSpanContext
  • Try to extract from record headers
  • If null, set it with active span
  • Reuse this span context for both startSpan and KafkaProducerCallback.

I will send the current PR into CI to get more feedback (see #10022) and ping @DataDog/apm-idm-java to get a proper review

@PlugaruT
Copy link
Contributor Author

Hi @PlugaruT 👋 Thanks for taking time to contribute.

Quick feedback about the change but I think this might break the context tracking later as KafkaProducerCallback won't use the extracted context.

What I would do is:

* Declare an AgentSpanContext

* Try to extract from record headers

* If null, set it with active span

* Reuse this span context for both `startSpan` and `KafkaProducerCallback`.

I will send the current PR into CI to get more feedback (see #10022) and ping @DataDog/apm-idm-java to get a proper review

Hi @PerfectSlayer, are you referring to how parent is passed down to the KafkaProducerCallback? Seems that indeed it's a flaw in my logic, for some reason I was under the impression that it's automatically passed with the span. My miss. Will get it fixed.

@PerfectSlayer
Copy link
Contributor

are you referring to how parent is passed down to the KafkaProducerCallback?

Yes

Seems that indeed it's a flaw in my logic, for some reason I was under the impression that it's automatically passed with the span. My miss. Will get it fixed.

The active span content is implicitly used for startSpan() if you don’t set one, but the KakfaProducerCallback need an explicit one. So better declare one explicit parent context and use it for both.

@PlugaruT
Copy link
Contributor Author

PlugaruT commented Dec 2, 2025

Hey @DataDog/apm-idm-java, can I get a review on this one 🙏

@PerfectSlayer
Copy link
Contributor

PerfectSlayer commented Dec 3, 2025

I pinged the team in their internal Slack channel

final AgentSpan span;
final AgentSpan callbackParentSpan;

if (extractedContext != null) {
Copy link
Contributor

@ygree ygree Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a breaking change if users expect activeSpan to take precedence. Adding a configuration option, would allow users to configure this behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually was expecting the other way around, especially coming from opentelemetry, I expected to have the context reuse the one from headers and attach new spans to the same trace and not break it.
I can do it, no problem there, but I'd say this is a bug fix rather than a feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's done. Let me know if the name works

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll exclude the commit that adds the setting, then. Let's see if all the CI checks pass in the mirroring PR: #10022

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking into the failing test. Just found how to run the system tests locally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ygree I force pushed without the commit that added the config and also fixed the test, at least it passes on my computer 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll pull your changes to the mirroring PR and run the CI checks.

Copy link
Contributor

@ygree ygree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same change should be made in datadog.trace.instrumentation.kafka_clients38.ProducerAdvice to cover Kafka 3.8+ clients.

@PlugaruT PlugaruT requested a review from a team as a code owner December 4, 2025 14:00
@PlugaruT PlugaruT requested a review from ygree December 4, 2025 17:39
@PlugaruT PlugaruT force-pushed the fix/propagate-kafka-producer-trace-context-if-exists branch from 79f8032 to 973db4e Compare December 5, 2025 15:18
@ygree ygree closed this in #10022 Dec 5, 2025
@ygree
Copy link
Contributor

ygree commented Dec 5, 2025

@PlugaruT Thank you for your contribution! Mirror PR #10022 has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

inst: kafka Kafka instrumentation tag: community Community contribution type: bug Bug report and fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants