-
Notifications
You must be signed in to change notification settings - Fork 255
feat: [iceberg] Native scan by serializing FileScanTasks to iceberg-rust #2528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
154 commits
Select commit
Hold shift + click to select a range
cded0ad
CometNativeIcebergScan with iceberg-rust using FileScanTasks.
mbutrovich 4f3004b
Clean up tests a little.
mbutrovich 4afec43
Remove old comment.
mbutrovich fc97ce9
Fix machete and missing suite CI failures.
mbutrovich cca4911
Fix unused variables.
mbutrovich 93f466d
Spark 4.0 needs Iceberg 1.10, let's see if that works in CI.
mbutrovich 970b692
Remove errant println.
mbutrovich c44973b
Remove old path() code path.
mbutrovich 0f83fd4
Update old comment.
mbutrovich 6cbbd09
Iceberg 1.5.x compatible reflection. Use 1.5.2 for Spark 3.4 and 3.5.
mbutrovich 6966a12
Fix scalastyle issues.
mbutrovich 1153d71
Merge branch 'main' into iceberg-rust
mbutrovich a0f4d63
Remove unused import.
mbutrovich a9cebfd
Clean up docs a bit.
mbutrovich 6b2175a
Refactor and cleanup.
mbutrovich 3618407
Refactor and cleanup.
mbutrovich 8091a81
Add IcebergFileStream based on DataFusion, add benchmark. Bump the Ic…
mbutrovich 880599e
Fix CometReadBenchmark.
mbutrovich 5127e1c
Merge branch 'main' into iceberg-rust
mbutrovich 878c971
Fixes after bringing in upstream/main.
mbutrovich e66799e
Basic complex type support.
mbutrovich 4f2f3b8
CometFuzzIceberg stuff.
mbutrovich 71df65c
Merge branch 'main' into iceberg-rust
mbutrovich 3371cc1
format and fix conflicts.
mbutrovich 1c40d43
Basic S3 test and properties support
mbutrovich 40c9a07
Fix NPE.
mbutrovich 19797f3
Merge branch 'main' into iceberg-rust
mbutrovich 236b339
Support migrated tables via https://github.com/apache/iceberg-rust/pu…
mbutrovich ce367cc
Update df50 commit based on field ID fix.
mbutrovich bd6c609
Bump df50 commit.
mbutrovich 33fa891
Support hive-partitioned Parquet files migrated to Iceberg tables wit…
mbutrovich ca13cc6
Bump df50.
mbutrovich b4e829f
Merge branch 'main' into iceberg-rust
mbutrovich e19e201
Fix after merging main.
mbutrovich 52019a9
update df50.
mbutrovich e62a1ee
fall back for table format v3, ORC, and Avro scans.
mbutrovich b97f36a
Fix TestFilterPushDown Iceberg Java suite by including filters in exp…
mbutrovich 08bfd70
Fix format.
mbutrovich a3bf186
Fix format.
mbutrovich a51652f
Fix UUID Iceberg type.
mbutrovich b06800c
Fix UUID Iceberg test.
mbutrovich 905dc97
Bump df50.
mbutrovich bdb5029
Merge branch 'main' into iceberg-rust
mbutrovich f8714bc
Iceberg planning and output_rows metrics.
mbutrovich 5f8256e
more output_rows tests.
mbutrovich 78591fa
Merge branch 'main' into iceberg-rust
mbutrovich 50a60ee
Dump DF 50.3 and df50 iceberg-rust commit.
mbutrovich 3611b8a
Update metrics recording for iceberg_scan.rs.
mbutrovich 6361943
FileStreamMetrics for iceberg_scan.rs
mbutrovich b3c88b9
Fix format.
mbutrovich b359171
numSplits metric.
mbutrovich f0b2d54
more filtering tests.
mbutrovich a5129d8
Change num_splits to be a runtime count instead of serialization time.
mbutrovich 861a575
Fix Spark 4 with ImmutableSQLMetric.
mbutrovich 27a1a75
New 1.9.1.diff
mbutrovich 7ca2cd4
New 1.8.1.diff
mbutrovich eb09e43
Fall back on unsupported file schemes, but add new tests to verify pa…
mbutrovich 591ff74
Fix partitioning test in CometIcebergNativeSuite
mbutrovich 2311d60
Fix schema evolution with snapshots.
mbutrovich 0c9a78d
Fix schemas for delete files.
mbutrovich 87f436a
Fall back for now for unsupported partitioning types and filter expre…
mbutrovich 5a88d19
Fix compilation
mbutrovich b0e6452
date32 schema change test.
mbutrovich 5485508
bump df50
mbutrovich eb3b93d
adjust fallback logic for complex types, add new tests.
mbutrovich 1740f18
Bump df50.
mbutrovich d9a5a1e
Bump df50.
mbutrovich f76cc99
Bump df50.
mbutrovich f33fb38
Bump df50.
mbutrovich 133772d
Serialize PartitionSpec stuff. Fixes ~50 spark-extensions tests from …
mbutrovich bf1342f
Bump df50.
mbutrovich a719a95
Merge branch 'main' into iceberg-rust
mbutrovich caf21c5
Bump df50.
mbutrovich a2021b5
Fall back on InMemoryFileIO tables (views).
mbutrovich 03afbbd
Fall back on truncate function.
mbutrovich 9ae3605
Add fuzz iceberg suite to CI again (it got lost when updating main)
mbutrovich 30a27e1
Merge branch 'main' into iceberg-rust
mbutrovich e3b0806
Apply #2675's partitioning fix to IcebergScanExec.
mbutrovich 2497ead
move IcebergScan serialization logic to a new file.
mbutrovich cf09648
separate checks and serialization logic, reduce redundant checks
mbutrovich 1f86a8e
remove num_partitions serialization
mbutrovich c5ce759
clean up planner.rs deserialization and comments
mbutrovich b53fa78
clean up iceberg_scan.rs comments
mbutrovich 58e3b3a
clean up CometIcebergNativeScanExec comments
mbutrovich fca2dd7
clean up more scala comments
mbutrovich 6f77912
Clean up planner.rs comments.
mbutrovich b88facf
clean up more planner.rs comments
mbutrovich b37a8cb
Merge branch 'main' into iceberg-rust
mbutrovich 47894e7
fix conflicts with main
mbutrovich fdc149e
Fix TestForwardCompatibility
mbutrovich d63829d
Fix serialization of partitionData, bump df50 to fix deserialization …
mbutrovich f2f1807
Format
mbutrovich 32c35b9
Fix format
mbutrovich 1a169b3
Fix format for realsies
mbutrovich c58d2ce
name mapping changes for iceberg-rust #1821.
mbutrovich c962714
clean up stray comments, format
mbutrovich 7277365
Merge branch 'main' into iceberg-rust
mbutrovich a52c69d
Update 1.8.1.diff with spotlessApply.
mbutrovich 95f6e24
Merge branch 'main' into iceberg-rust
mbutrovich 1b82ac3
Merge branch 'main' into iceberg-rust
mbutrovich 2cd4d7d
No longer inject partition default-values, it's redundant now that we…
mbutrovich d88c911
Fix format.
mbutrovich 7537276
Refactor.
mbutrovich 4d9da6b
Merge branch 'main' into iceberg-rust
mbutrovich b9934b6
Reformat after merging main.
mbutrovich 354903e
Refactor serde to main what's going on in main.
mbutrovich 4b15719
Refactor serde to main what's going on in main.
mbutrovich 640bf4d
Refactor serde to main what's going on in main.
mbutrovich 36eacbb
Fix spotless.
mbutrovich b434ac2
Merge branch 'main' into iceberg-rust
mbutrovich 08f8ed6
Fix spotless after merging main.
mbutrovich 71db424
Move CometIcebergNativeScan based on new operator serde logic.
mbutrovich 39c536c
Bump to latest iceberg-rust changes waiting to be merged.
mbutrovich 152d750
Merge branch 'main' into iceberg-rust
mbutrovich 9ee6ff7
Merge branch 'main' into iceberg-rust
mbutrovich aef4d0d
Update 1.10.0.diff for native Iceberg. Fix spotless after merging main.
mbutrovich 320dce2
Update 1.10.0.diff to not count deletes.
mbutrovich 0721b91
Update 1.10.0.diff to fix missing stuff.
mbutrovich d63a439
Bump iceberg-rust. Fall back in problematic scenarios of 1.10.0 tests.
mbutrovich 1ae7b4a
Fix format.
mbutrovich 8a4d827
bump iceberg-rust after binary equality delete fix. Remove fallback.
mbutrovich ad45a90
Fix CometFuzzIcebergSuite "order by random columns"
mbutrovich e137268
Fix TestAlterTablePartitionFields
mbutrovich 4fdd3da
Fix typo.
mbutrovich 3c9e61e
Merge branch 'main' into iceberg-rust
mbutrovich d17f97f
Remove truncate transform fallback and dead code in IcebergReflection…
mbutrovich 8e12782
Add fallback only for non-identity transforms in residuals. Fixes Tes…
mbutrovich a4c841e
Refactor to reduce repeated reflection calls.
mbutrovich c296956
Merge branch 'main' into iceberg-rust
mbutrovich e52e7e0
Fix after #2767.
mbutrovich 895c71c
Format.
mbutrovich f742b5c
Merge branch 'main' into iceberg-rust
mbutrovich d81885c
Simplify fileformat serialization.
mbutrovich bc1bcce
Fix backwards compat CometBatchScanExec arg number.
mbutrovich 224b3e7
Update Spark diffs for new arg in CometBatchScanExec.
mbutrovich 030c530
Merge branch 'main' into iceberg-rust
mbutrovich 974fe94
Merge branch 'main' into iceberg-rust
mbutrovich 7c0a99b
Switch to upstream iceberg-rust.
mbutrovich 4d3ffe5
Merge branch 'main' into iceberg-rust
mbutrovich 6a528fc
Merge branch 'main' into iceberg-rust
mbutrovich 8be08b8
Fix q79 plans? Not sure why this is needed.
mbutrovich 67a0cb6
Move iceberg-rust-related diffs to their own folder, and add new para…
mbutrovich 9ecd93a
Merge branch 'main' into iceberg-rust
mbutrovich 68652d4
Merge branch 'main' into iceberg-rust
mbutrovich 070590a
Update comment in pom file.
mbutrovich 46c507e
Update iceberg-rust workflow.
mbutrovich 773eded
Fix existing Iceberg integration.
mbutrovich cef5390
Fix match arms for old Iceberg integration.
mbutrovich 46f07d2
Move datafusion-datasource up to top cargo.toml, and set core's to wo…
mbutrovich cffc791
Add Spark 3.4 to Iceberg Java workflows for Iceberg-Rust code path.
mbutrovich 80b5fdc
Iceberg 1.5.2 for Spark 3.4.
mbutrovich a280467
Omit incompatible types from CometFuzzIcebergBase schema for Iceberg …
mbutrovich c6c3021
Remove test print.
mbutrovich 7911e4b
Adjust schema filtering logic in fuzz test "order by random columns" …
mbutrovich File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
| # | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # | ||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| name: Setup Iceberg Builder | ||
| description: 'Setup Apache Iceberg to run Spark SQL tests' | ||
| inputs: | ||
| iceberg-version: | ||
| description: 'The Apache Iceberg version (e.g., 1.8.1) to build' | ||
| required: true | ||
| runs: | ||
| using: "composite" | ||
| steps: | ||
| - name: Clone Iceberg repo | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| repository: apache/iceberg | ||
| path: apache-iceberg | ||
| ref: apache-iceberg-${{inputs.iceberg-version}} | ||
| fetch-depth: 1 | ||
|
|
||
| - name: Setup Iceberg for Comet | ||
| shell: bash | ||
| run: | | ||
| cd apache-iceberg | ||
| git apply ../dev/diffs/iceberg-rust/${{inputs.iceberg-version}}.diff |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The profile now says to use Iceberg 1.5 with Spark 3.4, but we do not have 1.5 here. Not sure if it causes problems...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what we currently test with this PR:
I leaned on newer versions for the Iceberg tests because as best as I could tell, never versions are a superset of the older versions. For the Comet-native tests we are running 1.5.2.
We should have a discussion of what we want to run long term, because right now tagging a PR
[iceberg]makes CI take hours and causes so many parallel Iceberg suites that we start getting network timeouts (likely due to throttling).