Optimize SQLite's complete happy path where setting `completed` without metadata #901

brandur · 2025-05-16T03:42:44Z

This one follows up #870 to add an optimization for job completion where
we separate out the most common case of setting jobs to completed
without any metadata required and update all of them in a simplified
batch query, then do the rest of the completions afterwards.

In any non-degenerate queue, most completions will be setting success
states so this should help with real world uses, but it also helps us
significantly improve SQLite's benchmarking numbers.

Here's a new benchmark run where throughput is ~4x what it was doing
before and roughly on par with Postgres:

$ go run ./cmd/river bench --database-url "sqlite://:memory:" --num-total-jobs 1_000_000
bench: jobs worked [          0 ], inserted [    1000000 ], job/sec [        0.0 ] [0s]
bench: jobs worked [      88218 ], inserted [          0 ], job/sec [    44109.0 ] [2s]
bench: jobs worked [      91217 ], inserted [          0 ], job/sec [    45608.5 ] [2s]
bench: jobs worked [      88858 ], inserted [          0 ], job/sec [    44429.0 ] [2s]
bench: jobs worked [      77219 ], inserted [          0 ], job/sec [    38609.5 ] [2s]
bench: jobs worked [      82045 ], inserted [          0 ], job/sec [    41022.5 ] [2s]
bench: jobs worked [      84052 ], inserted [          0 ], job/sec [    42026.0 ] [2s]
bench: jobs worked [      72028 ], inserted [          0 ], job/sec [    36014.0 ] [2s]
bench: jobs worked [      90047 ], inserted [          0 ], job/sec [    45023.5 ] [2s]
bench: jobs worked [      88875 ], inserted [          0 ], job/sec [    44437.5 ] [2s]
bench: jobs worked [      89240 ], inserted [          0 ], job/sec [    44620.0 ] [2s]
bench: jobs worked [      88842 ], inserted [          0 ], job/sec [    44421.0 ] [2s]
bench: jobs worked [      59359 ], inserted [          0 ], job/sec [    29679.5 ] [2s]
bench: total jobs worked [    1000000 ], total jobs inserted [    1000000 ], overall job/sec [    42822.8 ], running 23.35203575s

Here's a normal non-memory file-based database:

$ go run ./cmd/river bench --database-url "sqlite://./sqlite/bench.sqlite3" --num-total-jobs 1_000_000
bench: jobs worked [          0 ], inserted [    1000000 ], job/sec [        0.0 ] [0s]
bench: jobs worked [      83657 ], inserted [          0 ], job/sec [    41828.5 ] [2s]
bench: jobs worked [      76648 ], inserted [          0 ], job/sec [    38324.0 ] [2s]
bench: jobs worked [      88036 ], inserted [          0 ], job/sec [    44018.0 ] [2s]
bench: jobs worked [      75473 ], inserted [          0 ], job/sec [    37736.5 ] [2s]
bench: jobs worked [      82604 ], inserted [          0 ], job/sec [    41302.0 ] [2s]
bench: jobs worked [      84048 ], inserted [          0 ], job/sec [    42024.0 ] [2s]
bench: jobs worked [      85508 ], inserted [          0 ], job/sec [    42754.0 ] [2s]
bench: jobs worked [      90580 ], inserted [          0 ], job/sec [    45290.0 ] [2s]
bench: jobs worked [      83568 ], inserted [          0 ], job/sec [    41784.0 ] [2s]
bench: jobs worked [      86062 ], inserted [          0 ], job/sec [    43031.0 ] [2s]
bench: jobs worked [      88508 ], inserted [          0 ], job/sec [    44254.0 ] [2s]
bench: jobs worked [      75308 ], inserted [          0 ], job/sec [    37654.0 ] [2s]
bench: total jobs worked [    1000000 ], total jobs inserted [    1000000 ], overall job/sec [    42331.9 ], running 23.622860125s

The improved benchmarks only work for fixed job burndown mode (with the
--num-total-jobs option) because inserting jobs is still pretty slow
because it's still done one by one.

Once again, I'm pretty sure I'll be able to land some SQLite fixes
that'll make batch operations possible using json_each, and then we
should be able to make all normal operations batch-wise. That'll take
some time though, and we can get this optimization out in time for the
initial SQLite release.

brandur · 2025-05-16T03:45:51Z

internal/riverinternaltest/riverdrivertest/riverdrivertest.go

 			jobAfter := jobsAfter[0]
 			require.Equal(t, rivertype.JobStateCompleted, jobAfter.State)
-			require.WithinDuration(t, now, *jobAfter.FinalizedAt, time.Microsecond)
+			require.WithinDuration(t, now, *jobAfter.FinalizedAt, bundle.driver.TimePrecision())


Honestly ... not 100% sure how this was passing before. Should be constrained to DB-specific precision.

…ut metadata This one follows up #870 to add an optimization for job completion where we separate out the most common case of setting jobs to `completed` without any metadata required and update all of them in a simplified batch query, then do the rest of the completions afterwards. In any non-degenerate queue, most completions will be setting success states so this should help with real world uses, but it also helps us significantly improve SQLite's benchmarking numbers. Here's a new benchmark run where throughput is ~4x what it was doing before and roughly on par with Postgres: $ go run ./cmd/river bench --database-url "sqlite://:memory:" --num-total-jobs 1_000_000 bench: jobs worked [ 0 ], inserted [ 1000000 ], job/sec [ 0.0 ] [0s] bench: jobs worked [ 88218 ], inserted [ 0 ], job/sec [ 44109.0 ] [2s] bench: jobs worked [ 91217 ], inserted [ 0 ], job/sec [ 45608.5 ] [2s] bench: jobs worked [ 88858 ], inserted [ 0 ], job/sec [ 44429.0 ] [2s] bench: jobs worked [ 77219 ], inserted [ 0 ], job/sec [ 38609.5 ] [2s] bench: jobs worked [ 82045 ], inserted [ 0 ], job/sec [ 41022.5 ] [2s] bench: jobs worked [ 84052 ], inserted [ 0 ], job/sec [ 42026.0 ] [2s] bench: jobs worked [ 72028 ], inserted [ 0 ], job/sec [ 36014.0 ] [2s] bench: jobs worked [ 90047 ], inserted [ 0 ], job/sec [ 45023.5 ] [2s] bench: jobs worked [ 88875 ], inserted [ 0 ], job/sec [ 44437.5 ] [2s] bench: jobs worked [ 89240 ], inserted [ 0 ], job/sec [ 44620.0 ] [2s] bench: jobs worked [ 88842 ], inserted [ 0 ], job/sec [ 44421.0 ] [2s] bench: jobs worked [ 59359 ], inserted [ 0 ], job/sec [ 29679.5 ] [2s] bench: total jobs worked [ 1000000 ], total jobs inserted [ 1000000 ], overall job/sec [ 42822.8 ], running 23.35203575s Here's a normal non-memory file-based database: $ go run ./cmd/river bench --database-url "sqlite://./sqlite/bench.sqlite3" --num-total-jobs 1_000_000 bench: jobs worked [ 0 ], inserted [ 1000000 ], job/sec [ 0.0 ] [0s] bench: jobs worked [ 83657 ], inserted [ 0 ], job/sec [ 41828.5 ] [2s] bench: jobs worked [ 76648 ], inserted [ 0 ], job/sec [ 38324.0 ] [2s] bench: jobs worked [ 88036 ], inserted [ 0 ], job/sec [ 44018.0 ] [2s] bench: jobs worked [ 75473 ], inserted [ 0 ], job/sec [ 37736.5 ] [2s] bench: jobs worked [ 82604 ], inserted [ 0 ], job/sec [ 41302.0 ] [2s] bench: jobs worked [ 84048 ], inserted [ 0 ], job/sec [ 42024.0 ] [2s] bench: jobs worked [ 85508 ], inserted [ 0 ], job/sec [ 42754.0 ] [2s] bench: jobs worked [ 90580 ], inserted [ 0 ], job/sec [ 45290.0 ] [2s] bench: jobs worked [ 83568 ], inserted [ 0 ], job/sec [ 41784.0 ] [2s] bench: jobs worked [ 86062 ], inserted [ 0 ], job/sec [ 43031.0 ] [2s] bench: jobs worked [ 88508 ], inserted [ 0 ], job/sec [ 44254.0 ] [2s] bench: jobs worked [ 75308 ], inserted [ 0 ], job/sec [ 37654.0 ] [2s] bench: total jobs worked [ 1000000 ], total jobs inserted [ 1000000 ], overall job/sec [ 42331.9 ], running 23.622860125s The improved benchmarks only work for fixed job burndown mode (with the `--num-total-jobs` option) because inserting jobs is still pretty slow because it's still done one by one. Once again, I'm pretty sure I'll be able to land some SQLite fixes that'll make batch operations possible using `json_each`, and then we should be able to make all normal operations batch-wise. That'll take some time though, and we can get this optimization out in time for the initial SQLite release.

brandur · 2025-05-16T04:44:25Z

Ah damn, I forgot that the completer tries to set specific finalized_ats per completed job. Hard to do given current sqlc limitations.

bgentry · 2025-05-23T20:46:44Z

Dang. Making sure you don't actually want me to review this or anything since you've hit a blocker.

brandur · 2025-05-24T00:22:14Z

Nah, no review required for now. Thx.

brandur force-pushed the brandur-optimize-complete-happy-path branch from 2bf276b to 2c01702 Compare May 16, 2025 03:44

brandur commented May 16, 2025

View reviewed changes

brandur force-pushed the brandur-optimize-complete-happy-path branch from 2c01702 to dda5dda Compare May 16, 2025 03:47

brandur marked this pull request as draft May 16, 2025 04:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize SQLite's complete happy path where setting `completed` without metadata #901

Optimize SQLite's complete happy path where setting `completed` without metadata #901

Uh oh!

brandur commented May 16, 2025

Uh oh!

brandur May 16, 2025

Uh oh!

brandur commented May 16, 2025

Uh oh!

bgentry commented May 23, 2025

Uh oh!

brandur commented May 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimize SQLite's complete happy path where setting completed without metadata #901

Are you sure you want to change the base?

Optimize SQLite's complete happy path where setting completed without metadata #901

Uh oh!

Conversation

brandur commented May 16, 2025

Uh oh!

brandur May 16, 2025

Choose a reason for hiding this comment

Uh oh!

brandur commented May 16, 2025

Uh oh!

bgentry commented May 23, 2025

Uh oh!

brandur commented May 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Optimize SQLite's complete happy path where setting `completed` without metadata #901

Optimize SQLite's complete happy path where setting `completed` without metadata #901