-
Notifications
You must be signed in to change notification settings - Fork 226
feat: sidekiq scheduled jobs improvements - scheduled operation and add links
#1717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
bfb7c63
dc00407
6a3983f
4ed7a70
37deaa1
730ca36
8055c89
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,14 +24,25 @@ def call(_worker_class, job, _queue, _redis_pool) | |
| } | ||
| attributes[SemanticConventions::Trace::PEER_SERVICE] = instrumentation_config[:peer_service] if instrumentation_config[:peer_service] | ||
|
|
||
| scheduled_at = job['at'] | ||
| op = scheduled_at.nil? ? 'publish' : 'scheduled' | ||
|
|
||
| span_name = case instrumentation_config[:span_naming] | ||
| when :job_class then "#{job['wrapped']&.to_s || job['class']} publish" | ||
| else "#{job['queue']} publish" | ||
| when :job_class then "#{job['wrapped']&.to_s || job['class']} #{op}" | ||
| else "#{job['queue']} #{op}" | ||
| end | ||
|
|
||
| tracer.in_span(span_name, attributes: attributes, kind: :producer) do |span| | ||
| # In case this is Scheduled job, there is already context injected, so link to that context | ||
| # NOTE: :propagation_style = :child is not supported as it is quite tricky when :trace_poller_enqueue = true | ||
| extracted_context = OpenTelemetry.propagation.extract(job, context: OpenTelemetry::Context::ROOT) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is quite interesting and do not see a case covered in the spec https://github.com/open-telemetry/semantic-conventions/blob/v1.24.0/docs/messaging/messaging-spans.md#span-name If I understand this correctly, this is disconnecting the parent/child context and treats a Sidekiq scheduled job from the place where it was enqueued? This seems unexpected to do from the client all cases of links because it is going to disconnect the clients, publishers and consumers from each other generating 3 traces ids instead of 2. By default, even in a link situation, I would expect the publisher span to use parent/child propagation and the consumer to generate links. E.g. you have a web request that uses def checkout
# ... schedule housekeeping clean up
GuestsCleanupJob.set(wait_until: Date.tomorrow.noon).perform_later(guest)
endWhen the
With these changes, it will end up changing this expected behavior and that is undesirable in my opinion. My recommendation here would be for you to add a utility method to your code or identify something more specific in
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here is how Sidekiq is working, to what I have found:
When it comes to OpenTelemetry instrumentation of that: Currently when when using
Caveats:
My goal is to have meaningful correlation from P.1 to P.4, regardless same Trace or Links, whatever is easy and feasible.
Resulting in this more meaningful list and correlation with links or traces
I have been thinking to do some monkey patching or whatever, but it is way harder, moreover this PR has value for others. So @arielvalentin how about, the following?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not a sidekiq user so I'm thinking about this from the perspective of an active job user who may use the sidekiq adapter. I see your point though of using non standard semconv in the cases where the naming pattern is Job Name instead of semantic conventions. I think in those cases using I'd need to review your comments more carefully since again, I'm not a sidekiq user so I don't quite understand the nuances of the cause of disconnect between the producer and consumer spans in the cases you outlined above.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also am not a sidekiq user but am coming from the sem conv side. I am wondering if the messaging conventions are the correct conventions to even be following. When I do some searching I get the impression that it is a task/job system and the official website seems to back this up.
Based on that I suspect it would be highly unlikely to be able to get sidekiq listed in sem conv as a messaging system. So with that in mind as well as the sidekiq docs I find myself asking myself how would I define this and what I come up with is:
I know this would be a significant change and additional feedback should be sought from others but i feel moving away from using messaging spans better sets up given the metric definitions available in the Workflow pr. I am more than willing to refine the attribute usage in those spans based on findings when attempting to implement it. Also in coming up with span name convention.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't mind the conventions as long as things are well observable and correlated, however this is a huge change that I am not keen on doing as part of this PR.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree it would be a big change but my concern is introducing the schedule messaging operation as a way to try and force the scenario of scheduling a task into the messaging specs. As a middle ground as if I understand correctly one of the issues is differentiating between the scheduling and publishing event. In that case why don't we if we are scheduling a task follow a definition which is not messaging. That way we aren't making the situation worse and I wouldn't have an issue if a couple of attributes were shared to assist with interoperability ie message.message.id |
||
| links = [] | ||
| span_context = OpenTelemetry::Trace.current_span(extracted_context).context | ||
| links << OpenTelemetry::Trace::Link.new(span_context) if instrumentation_config[:propagation_style] == :link && span_context.valid? | ||
|
|
||
| tracer.in_span(span_name, attributes: attributes, links: links, kind: :producer) do |span| | ||
| OpenTelemetry.propagation.inject(job) | ||
| span.add_event('created_at', timestamp: time_from_timestamp(job['created_at'])) | ||
| span.add_event('scheduled_at', timestamp: time_from_timestamp(scheduled_at)) unless scheduled_at.nil? | ||
| yield | ||
| end | ||
| end | ||
|
|
||


There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 This naming does not match any of the semantic convention naming for messaging.
https://github.com/open-telemetry/semantic-conventions/blob/v1.24.0/docs/messaging/messaging-spans.md#span-name
Is there something special about scheduled spans that help you disambiguate them from any other publisher?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will answer in the next comment ...