Worker is trying to rescue jobs from queues that are not assigned to it

# A bit of context

We have two kinds of workers at the moment, sharing the same database, let's say `worker-1` is running jobs from `queue-1` and `worker-2` is running jobs from `queue-2`. However, we are using the same binary for the worker and we recently tried registering jobs conditionally, basically registering only those jobs that are supposed to be executed by this worker based on queue selection.

Another thing to mention is that our `RescueStuckJobsAfter` is configured to be pretty small (30s) whereas timeouts on the jobs are minutes.

# An issue
After introducing this conditional registration, we observed the following behaviour:
- `worker-1` is fetching all jobs (for both queues) to rescue: https://github.com/riverqueue/river/blob/eb0b9854baa90b073783ae76841df154c311b7dd/riverdriver/riverpgxv5/internal/dbsqlc/river_job.sql#L257-L263
- amongst others, it can fetch jobs dedicated for `queue-2` and they are not registered in the `worker-1`, so `makeRetryDecision` discards the job: https://github.com/riverqueue/river/blob/eb0b9854baa90b073783ae76841df154c311b7dd/internal/maintenance/job_rescuer.go#L302-L307

Checked documentation if we are missing something, and both pages that could have mention any details of this behaviour are not saying anything about our weird case:
- [Inserting and working jobs](https://riverqueue.com/docs/inserting-and-working-jobs) does not mention if all jobs from a single database have to be registered in all workers
- [Multiple queues](https://riverqueue.com/docs/multiple-queues) only says _but workers will only select jobs to work for queues that they're configured to handle_, but still it is unclear if we should register all jobs or not

A very bad side effect of this for us was that discarded job continued execution (context wasn't cancelled) and another job with the same `unique_key` was scheduled, violating unique constraint that we are relying on for correctness, but I am not blaming this on river, this is a consequence of our ignorance of rescue mechanics nuances. 

---

Short-term we fixed our issue by again registering all jobs in the worker binary regardless its configuration, but it would be valuable to have an answer if this is a designed behaviour or a bug?

	-- name: JobGetStuck :many
	SELECT *
	FROM /* TEMPLATE: schema */river_job
	WHERE state = 'running'
	AND attempted_at < @stuck_horizon::timestamptz
	ORDER BY id
	LIMIT @max;

	workUnitFactory := s.Config.WorkUnitFactoryFunc(job.Kind)
	if workUnitFactory == nil {
	s.Logger.ErrorContext(ctx, s.Name+": Attempted to rescue unhandled job kind, discarding",
	slog.String("job_kind", job.Kind), slog.Int64("job_id", job.ID))
	return jobRetryDecisionDiscard, time.Time{}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Worker is trying to rescue jobs from queues that are not assigned to it #1105

A bit of context

An issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Worker is trying to rescue jobs from queues that are not assigned to it #1105

Description

A bit of context

An issue

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions