-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Take this sync stream query as an example:
SELECT * FROM tasks WHERE project_id = subscription.parameter('project_id') AND project_id IN auth.parameter('project_ids')This is a fairly typical requirement: Sync streams are created for specific projects, while the JWT contains an array of all projects that the user may access.
The issue is that the internal bucket structure is quite inefficient here: This implicitly creates two bucket parameters for the two clauses, even though they reference the same field. So if you subscribe to {project_id: 1} and have a token with {project_ids: [1,2,3]}, we try to sync buckets for tasks[1,1], tasks[1,2], tasks[1,3], even though data can only exist for tasks[1,1]. In some cases, this can result in exceeding the bucket limit of 1000.
A current workaround is to refactor the clauses into this:
SELECT * FROM tasks WHERE project_id = subscription.parameter('project_id') AND subscription.parameter('project_id') IN auth.parameter('project_ids')This avoids the two bucket parameters, resolving the issue. However, we should be able to do this automatically.
Proposal
When the same field is referenced in multiple clauses, merge the bucket parameter handling for them. We can do this in different ways:
- Only have a single bucket parameter for this case. This is effectively a change in storage format, meaning we need to take care of not breaking instances with existing data (see "replication logic version" in [Postgres] Use primary key for REPLICA IDENTITY FULL tables #397).
- Keep the two parameters, but pre-filter out any combinations that cannot exist.
We'll need to check if it's feasible to apply the same optimization to more cases, for example:
-- 1. Using functions / operators on the field
SELECT * FROM tasks WHERE lower(project_id) = subscription.parameter('project_id') AND lower(project_id) IN auth.parameter('project_ids')
-- 2. Combining with AND/OR operators (should work after converting to a normalized representation)
SELECT * FROM tasks WHERE always_sync OR (project_id = subscription.parameter('project_id') AND project_id IN auth.parameter('project_ids') AND auth.parameter('is_admin'))