Skip to content

Conversation

@LsomeYeah
Copy link
Contributor

@LsomeYeah LsomeYeah commented Dec 19, 2025

Purpose

Linked issue: close #xxx

By default, Paimon's append bucketed tables maintain data ordering. However, this ordering requirement can be relaxed to enable additional optimizations.

This PR introduces the ability to disable ordering requirements for append bucketed tables, allowing incremental clustering within buckets. When ordering is not strictly required, data can be incrementally clustered within each bucket, significantly improving query performance for bucket-key + clustering-key combinations.

Unlike append-unaware tables that require range partitioning, bucketed tables only need to shuffle by partition + bucket and perform local clustering within each bucket partition, making this approach much more efficient and resource-friendly.

Tests

API and Format

Documentation

@LsomeYeah LsomeYeah marked this pull request as draft December 19, 2025 02:02
@LsomeYeah LsomeYeah marked this pull request as ready for review December 23, 2025 12:13
Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea is that the clustering of this Bucketed table is like merging small files of Append+Bucket-1 table, directly generating tasks for different concurrent writer to reads and writes separately.

runsInfo);
});
}
partitionLevels.forEach(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too deep. Use for loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants