Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
154 commits
Select commit Hold shift + click to select a range
cded0ad
CometNativeIcebergScan with iceberg-rust using FileScanTasks.
mbutrovich Oct 6, 2025
4f3004b
Clean up tests a little.
mbutrovich Oct 6, 2025
4afec43
Remove old comment.
mbutrovich Oct 6, 2025
fc97ce9
Fix machete and missing suite CI failures.
mbutrovich Oct 6, 2025
cca4911
Fix unused variables.
mbutrovich Oct 6, 2025
93f466d
Spark 4.0 needs Iceberg 1.10, let's see if that works in CI.
mbutrovich Oct 6, 2025
970b692
Remove errant println.
mbutrovich Oct 6, 2025
c44973b
Remove old path() code path.
mbutrovich Oct 6, 2025
0f83fd4
Update old comment.
mbutrovich Oct 6, 2025
6cbbd09
Iceberg 1.5.x compatible reflection. Use 1.5.2 for Spark 3.4 and 3.5.
mbutrovich Oct 6, 2025
6966a12
Fix scalastyle issues.
mbutrovich Oct 6, 2025
1153d71
Merge branch 'main' into iceberg-rust
mbutrovich Oct 7, 2025
a0f4d63
Remove unused import.
mbutrovich Oct 7, 2025
a9cebfd
Clean up docs a bit.
mbutrovich Oct 7, 2025
6b2175a
Refactor and cleanup.
mbutrovich Oct 7, 2025
3618407
Refactor and cleanup.
mbutrovich Oct 7, 2025
8091a81
Add IcebergFileStream based on DataFusion, add benchmark. Bump the Ic…
mbutrovich Oct 8, 2025
880599e
Fix CometReadBenchmark.
mbutrovich Oct 8, 2025
5127e1c
Merge branch 'main' into iceberg-rust
mbutrovich Oct 16, 2025
878c971
Fixes after bringing in upstream/main.
mbutrovich Oct 16, 2025
e66799e
Basic complex type support.
mbutrovich Oct 16, 2025
4f2f3b8
CometFuzzIceberg stuff.
mbutrovich Oct 20, 2025
71df65c
Merge branch 'main' into iceberg-rust
mbutrovich Oct 21, 2025
3371cc1
format and fix conflicts.
mbutrovich Oct 21, 2025
1c40d43
Basic S3 test and properties support
mbutrovich Oct 21, 2025
40c9a07
Fix NPE.
mbutrovich Oct 21, 2025
19797f3
Merge branch 'main' into iceberg-rust
mbutrovich Oct 21, 2025
236b339
Support migrated tables via https://github.com/apache/iceberg-rust/pu…
mbutrovich Oct 22, 2025
ce367cc
Update df50 commit based on field ID fix.
mbutrovich Oct 22, 2025
bd6c609
Bump df50 commit.
mbutrovich Oct 22, 2025
33fa891
Support hive-partitioned Parquet files migrated to Iceberg tables wit…
mbutrovich Oct 22, 2025
ca13cc6
Bump df50.
mbutrovich Oct 22, 2025
b4e829f
Merge branch 'main' into iceberg-rust
mbutrovich Oct 22, 2025
e19e201
Fix after merging main.
mbutrovich Oct 22, 2025
52019a9
update df50.
mbutrovich Oct 23, 2025
e62a1ee
fall back for table format v3, ORC, and Avro scans.
mbutrovich Oct 23, 2025
b97f36a
Fix TestFilterPushDown Iceberg Java suite by including filters in exp…
mbutrovich Oct 23, 2025
08bfd70
Fix format.
mbutrovich Oct 23, 2025
a3bf186
Fix format.
mbutrovich Oct 23, 2025
a51652f
Fix UUID Iceberg type.
mbutrovich Oct 24, 2025
b06800c
Fix UUID Iceberg test.
mbutrovich Oct 24, 2025
905dc97
Bump df50.
mbutrovich Oct 24, 2025
bdb5029
Merge branch 'main' into iceberg-rust
mbutrovich Oct 24, 2025
f8714bc
Iceberg planning and output_rows metrics.
mbutrovich Oct 25, 2025
5f8256e
more output_rows tests.
mbutrovich Oct 25, 2025
78591fa
Merge branch 'main' into iceberg-rust
mbutrovich Oct 25, 2025
50a60ee
Dump DF 50.3 and df50 iceberg-rust commit.
mbutrovich Oct 25, 2025
3611b8a
Update metrics recording for iceberg_scan.rs.
mbutrovich Oct 25, 2025
6361943
FileStreamMetrics for iceberg_scan.rs
mbutrovich Oct 25, 2025
b3c88b9
Fix format.
mbutrovich Oct 25, 2025
b359171
numSplits metric.
mbutrovich Oct 26, 2025
f0b2d54
more filtering tests.
mbutrovich Oct 26, 2025
a5129d8
Change num_splits to be a runtime count instead of serialization time.
mbutrovich Oct 26, 2025
861a575
Fix Spark 4 with ImmutableSQLMetric.
mbutrovich Oct 26, 2025
27a1a75
New 1.9.1.diff
mbutrovich Oct 27, 2025
7ca2cd4
New 1.8.1.diff
mbutrovich Oct 27, 2025
eb09e43
Fall back on unsupported file schemes, but add new tests to verify pa…
mbutrovich Oct 27, 2025
591ff74
Fix partitioning test in CometIcebergNativeSuite
mbutrovich Oct 27, 2025
2311d60
Fix schema evolution with snapshots.
mbutrovich Oct 27, 2025
0c9a78d
Fix schemas for delete files.
mbutrovich Oct 28, 2025
87f436a
Fall back for now for unsupported partitioning types and filter expre…
mbutrovich Oct 28, 2025
5a88d19
Fix compilation
mbutrovich Oct 28, 2025
b0e6452
date32 schema change test.
mbutrovich Oct 28, 2025
5485508
bump df50
mbutrovich Oct 28, 2025
eb3b93d
adjust fallback logic for complex types, add new tests.
mbutrovich Oct 29, 2025
1740f18
Bump df50.
mbutrovich Oct 29, 2025
d9a5a1e
Bump df50.
mbutrovich Oct 30, 2025
f76cc99
Bump df50.
mbutrovich Oct 30, 2025
f33fb38
Bump df50.
mbutrovich Oct 30, 2025
133772d
Serialize PartitionSpec stuff. Fixes ~50 spark-extensions tests from …
mbutrovich Oct 30, 2025
bf1342f
Bump df50.
mbutrovich Oct 30, 2025
a719a95
Merge branch 'main' into iceberg-rust
mbutrovich Oct 30, 2025
caf21c5
Bump df50.
mbutrovich Oct 31, 2025
a2021b5
Fall back on InMemoryFileIO tables (views).
mbutrovich Oct 31, 2025
03afbbd
Fall back on truncate function.
mbutrovich Oct 31, 2025
9ae3605
Add fuzz iceberg suite to CI again (it got lost when updating main)
mbutrovich Nov 3, 2025
30a27e1
Merge branch 'main' into iceberg-rust
mbutrovich Nov 3, 2025
e3b0806
Apply #2675's partitioning fix to IcebergScanExec.
mbutrovich Nov 3, 2025
2497ead
move IcebergScan serialization logic to a new file.
mbutrovich Nov 3, 2025
cf09648
separate checks and serialization logic, reduce redundant checks
mbutrovich Nov 3, 2025
1f86a8e
remove num_partitions serialization
mbutrovich Nov 3, 2025
c5ce759
clean up planner.rs deserialization and comments
mbutrovich Nov 3, 2025
b53fa78
clean up iceberg_scan.rs comments
mbutrovich Nov 3, 2025
58e3b3a
clean up CometIcebergNativeScanExec comments
mbutrovich Nov 3, 2025
fca2dd7
clean up more scala comments
mbutrovich Nov 3, 2025
6f77912
Clean up planner.rs comments.
mbutrovich Nov 3, 2025
b88facf
clean up more planner.rs comments
mbutrovich Nov 3, 2025
b37a8cb
Merge branch 'main' into iceberg-rust
mbutrovich Nov 3, 2025
47894e7
fix conflicts with main
mbutrovich Nov 3, 2025
fdc149e
Fix TestForwardCompatibility
mbutrovich Nov 3, 2025
d63829d
Fix serialization of partitionData, bump df50 to fix deserialization …
mbutrovich Nov 3, 2025
f2f1807
Format
mbutrovich Nov 3, 2025
32c35b9
Fix format
mbutrovich Nov 4, 2025
1a169b3
Fix format for realsies
mbutrovich Nov 4, 2025
c58d2ce
name mapping changes for iceberg-rust #1821.
mbutrovich Nov 4, 2025
c962714
clean up stray comments, format
mbutrovich Nov 4, 2025
7277365
Merge branch 'main' into iceberg-rust
mbutrovich Nov 4, 2025
a52c69d
Update 1.8.1.diff with spotlessApply.
mbutrovich Nov 6, 2025
95f6e24
Merge branch 'main' into iceberg-rust
mbutrovich Nov 6, 2025
1b82ac3
Merge branch 'main' into iceberg-rust
mbutrovich Nov 6, 2025
2cd4d7d
No longer inject partition default-values, it's redundant now that we…
mbutrovich Nov 6, 2025
d88c911
Fix format.
mbutrovich Nov 6, 2025
7537276
Refactor.
mbutrovich Nov 9, 2025
4d9da6b
Merge branch 'main' into iceberg-rust
mbutrovich Nov 9, 2025
b9934b6
Reformat after merging main.
mbutrovich Nov 9, 2025
354903e
Refactor serde to main what's going on in main.
mbutrovich Nov 9, 2025
4b15719
Refactor serde to main what's going on in main.
mbutrovich Nov 9, 2025
640bf4d
Refactor serde to main what's going on in main.
mbutrovich Nov 9, 2025
36eacbb
Fix spotless.
mbutrovich Nov 10, 2025
b434ac2
Merge branch 'main' into iceberg-rust
mbutrovich Nov 10, 2025
08f8ed6
Fix spotless after merging main.
mbutrovich Nov 10, 2025
71db424
Move CometIcebergNativeScan based on new operator serde logic.
mbutrovich Nov 10, 2025
39c536c
Bump to latest iceberg-rust changes waiting to be merged.
mbutrovich Nov 11, 2025
152d750
Merge branch 'main' into iceberg-rust
mbutrovich Nov 11, 2025
9ee6ff7
Merge branch 'main' into iceberg-rust
mbutrovich Nov 11, 2025
aef4d0d
Update 1.10.0.diff for native Iceberg. Fix spotless after merging main.
mbutrovich Nov 11, 2025
320dce2
Update 1.10.0.diff to not count deletes.
mbutrovich Nov 12, 2025
0721b91
Update 1.10.0.diff to fix missing stuff.
mbutrovich Nov 12, 2025
d63a439
Bump iceberg-rust. Fall back in problematic scenarios of 1.10.0 tests.
mbutrovich Nov 12, 2025
1ae7b4a
Fix format.
mbutrovich Nov 12, 2025
8a4d827
bump iceberg-rust after binary equality delete fix. Remove fallback.
mbutrovich Nov 12, 2025
ad45a90
Fix CometFuzzIcebergSuite "order by random columns"
mbutrovich Nov 12, 2025
e137268
Fix TestAlterTablePartitionFields
mbutrovich Nov 12, 2025
4fdd3da
Fix typo.
mbutrovich Nov 12, 2025
3c9e61e
Merge branch 'main' into iceberg-rust
mbutrovich Nov 12, 2025
d17f97f
Remove truncate transform fallback and dead code in IcebergReflection…
mbutrovich Nov 12, 2025
8e12782
Add fallback only for non-identity transforms in residuals. Fixes Tes…
mbutrovich Nov 13, 2025
a4c841e
Refactor to reduce repeated reflection calls.
mbutrovich Nov 13, 2025
c296956
Merge branch 'main' into iceberg-rust
mbutrovich Nov 13, 2025
e52e7e0
Fix after #2767.
mbutrovich Nov 13, 2025
895c71c
Format.
mbutrovich Nov 14, 2025
f742b5c
Merge branch 'main' into iceberg-rust
mbutrovich Nov 14, 2025
d81885c
Simplify fileformat serialization.
mbutrovich Nov 14, 2025
bc1bcce
Fix backwards compat CometBatchScanExec arg number.
mbutrovich Nov 14, 2025
224b3e7
Update Spark diffs for new arg in CometBatchScanExec.
mbutrovich Nov 14, 2025
030c530
Merge branch 'main' into iceberg-rust
mbutrovich Nov 17, 2025
974fe94
Merge branch 'main' into iceberg-rust
mbutrovich Nov 18, 2025
7c0a99b
Switch to upstream iceberg-rust.
mbutrovich Nov 18, 2025
4d3ffe5
Merge branch 'main' into iceberg-rust
mbutrovich Nov 19, 2025
6a528fc
Merge branch 'main' into iceberg-rust
mbutrovich Nov 19, 2025
8be08b8
Fix q79 plans? Not sure why this is needed.
mbutrovich Nov 19, 2025
67a0cb6
Move iceberg-rust-related diffs to their own folder, and add new para…
mbutrovich Nov 19, 2025
9ecd93a
Merge branch 'main' into iceberg-rust
mbutrovich Nov 19, 2025
68652d4
Merge branch 'main' into iceberg-rust
mbutrovich Nov 19, 2025
070590a
Update comment in pom file.
mbutrovich Nov 19, 2025
46c507e
Update iceberg-rust workflow.
mbutrovich Nov 19, 2025
773eded
Fix existing Iceberg integration.
mbutrovich Nov 19, 2025
cef5390
Fix match arms for old Iceberg integration.
mbutrovich Nov 19, 2025
46f07d2
Move datafusion-datasource up to top cargo.toml, and set core's to wo…
mbutrovich Nov 19, 2025
cffc791
Add Spark 3.4 to Iceberg Java workflows for Iceberg-Rust code path.
mbutrovich Nov 19, 2025
80b5fdc
Iceberg 1.5.2 for Spark 3.4.
mbutrovich Nov 20, 2025
a280467
Omit incompatible types from CometFuzzIcebergBase schema for Iceberg …
mbutrovich Nov 20, 2025
c6c3021
Remove test print.
mbutrovich Nov 20, 2025
7911e4b
Adjust schema filtering logic in fuzz test "order by random columns" …
mbutrovich Nov 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions .github/actions/setup-iceberg-rust-builder/action.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

name: Setup Iceberg Builder
description: 'Setup Apache Iceberg to run Spark SQL tests'
inputs:
iceberg-version:
description: 'The Apache Iceberg version (e.g., 1.8.1) to build'
required: true
runs:
using: "composite"
steps:
- name: Clone Iceberg repo
uses: actions/checkout@v4
with:
repository: apache/iceberg
path: apache-iceberg
ref: apache-iceberg-${{inputs.iceberg-version}}
fetch-depth: 1

- name: Setup Iceberg for Comet
shell: bash
run: |
cd apache-iceberg
git apply ../dev/diffs/iceberg-rust/${{inputs.iceberg-version}}.diff
117 changes: 117 additions & 0 deletions .github/workflows/iceberg_spark_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -156,3 +156,120 @@ jobs:
ENABLE_COMET=true ENABLE_COMET_ONHEAP=true ./gradlew -DsparkVersions=${{ matrix.spark-version.short }} -DscalaVersion=${{ matrix.scala-version }} -DflinkVersions= -DkafkaVersions= \
:iceberg-spark:iceberg-spark-runtime-${{ matrix.spark-version.short }}_${{ matrix.scala-version }}:integrationTest \
-Pquick=true -x javadoc

iceberg-spark-rust:
if: contains(github.event.pull_request.title, '[iceberg]')
strategy:
matrix:
os: [ubuntu-24.04]
java-version: [11, 17]
iceberg-version: [{short: '1.8', full: '1.8.1'}, {short: '1.9', full: '1.9.1'}, {short: '1.10', full: '1.10.0'}]
spark-version: [{short: '3.4', full: '3.4.3'}, {short: '3.5', full: '3.5.7'}]
Comment on lines +166 to +167
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The profile now says to use Iceberg 1.5 with Spark 3.4, but we do not have 1.5 here. Not sure if it causes problems...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what we currently test with this PR:

3.4 3.5 4.0
1.5.2 CometIcebergNativeSuite CometFuzzIcebergSuite IcebergReadFromS3Suite (not run in CI due to MinIO container)
1.8.1 Iceberg Spark Tests Iceberg Spark Extensions Tests Iceberg Spark Runtime Tests Iceberg Spark Tests Iceberg Spark Extensions Tests Iceberg Spark Runtime Tests CometIcebergNativeSuite CometFuzzIcebergSuite IcebergReadFromS3Suite (not run in CI due to MinIO container)
1.9.1 Iceberg Spark Tests Iceberg Spark Extensions Tests Iceberg Spark Runtime Tests Iceberg Spark Tests Iceberg Spark Extensions Tests Iceberg Spark Runtime Tests
1.10 Iceberg Spark Tests Iceberg Spark Extensions Tests Iceberg Spark Runtime Tests Iceberg Spark Tests Iceberg Spark Extensions Tests Iceberg Spark Runtime Tests CometIcebergNativeSuite CometFuzzIcebergSuite IcebergReadFromS3Suite (not run in CI due to MinIO container)

I leaned on newer versions for the Iceberg tests because as best as I could tell, never versions are a superset of the older versions. For the Comet-native tests we are running 1.5.2.

We should have a discussion of what we want to run long term, because right now tagging a PR [iceberg] makes CI take hours and causes so many parallel Iceberg suites that we start getting network timeouts (likely due to throttling).

scala-version: ['2.13']
fail-fast: false
name: iceberg-spark-rust/${{ matrix.os }}/iceberg-${{ matrix.iceberg-version.full }}/spark-${{ matrix.spark-version.full }}/scala-${{ matrix.scala-version }}/java-${{ matrix.java-version }}
runs-on: ${{ matrix.os }}
container:
image: amd64/rust
env:
SPARK_LOCAL_IP: localhost
steps:
- uses: actions/checkout@v5
- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{env.RUST_VERSION}}
jdk-version: ${{ matrix.java-version }}
- name: Build Comet
shell: bash
run: |
PROFILES="-Pspark-${{matrix.spark-version.short}} -Pscala-${{matrix.scala-version}}" make release
- name: Setup Iceberg
uses: ./.github/actions/setup-iceberg-rust-builder
with:
iceberg-version: ${{ matrix.iceberg-version.full }}
- name: Run Iceberg Spark tests (Rust)
run: |
cd apache-iceberg
rm -rf /root/.m2/repository/org/apache/parquet # somehow parquet cache requires cleanups
ENABLE_COMET=true ENABLE_COMET_ONHEAP=true ./gradlew -DsparkVersions=${{ matrix.spark-version.short }} -DscalaVersion=${{ matrix.scala-version }} -DflinkVersions= -DkafkaVersions= \
:iceberg-spark:iceberg-spark-${{ matrix.spark-version.short }}_${{ matrix.scala-version }}:test \
-Pquick=true -x javadoc

iceberg-spark-extensions-rust:
if: contains(github.event.pull_request.title, '[iceberg]')
strategy:
matrix:
os: [ubuntu-24.04]
java-version: [11, 17]
iceberg-version: [{short: '1.8', full: '1.8.1'}, {short: '1.9', full: '1.9.1'}, {short: '1.10', full: '1.10.0'}]
spark-version: [{short: '3.4', full: '3.4.3'}, {short: '3.5', full: '3.5.7'}]
scala-version: ['2.13']
fail-fast: false
name: iceberg-spark-extensions-rust/${{ matrix.os }}/iceberg-${{ matrix.iceberg-version.full }}/spark-${{ matrix.spark-version.full }}/scala-${{ matrix.scala-version }}/java-${{ matrix.java-version }}
runs-on: ${{ matrix.os }}
container:
image: amd64/rust
env:
SPARK_LOCAL_IP: localhost
steps:
- uses: actions/checkout@v5
- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{env.RUST_VERSION}}
jdk-version: ${{ matrix.java-version }}
- name: Build Comet
shell: bash
run: |
PROFILES="-Pspark-${{matrix.spark-version.short}} -Pscala-${{matrix.scala-version}}" make release
- name: Setup Iceberg
uses: ./.github/actions/setup-iceberg-rust-builder
with:
iceberg-version: ${{ matrix.iceberg-version.full }}
- name: Run Iceberg Spark extensions tests (Rust)
run: |
cd apache-iceberg
rm -rf /root/.m2/repository/org/apache/parquet # somehow parquet cache requires cleanups
ENABLE_COMET=true ENABLE_COMET_ONHEAP=true ./gradlew -DsparkVersions=${{ matrix.spark-version.short }} -DscalaVersion=${{ matrix.scala-version }} -DflinkVersions= -DkafkaVersions= \
:iceberg-spark:iceberg-spark-extensions-${{ matrix.spark-version.short }}_${{ matrix.scala-version }}:test \
-Pquick=true -x javadoc

iceberg-spark-runtime-rust:
if: contains(github.event.pull_request.title, '[iceberg]')
strategy:
matrix:
os: [ubuntu-24.04]
java-version: [11, 17]
iceberg-version: [{short: '1.8', full: '1.8.1'}, {short: '1.9', full: '1.9.1'}, {short: '1.10', full: '1.10.0'}]
spark-version: [{short: '3.4', full: '3.4.3'}, {short: '3.5', full: '3.5.7'}]
scala-version: ['2.13']
fail-fast: false
name: iceberg-spark-runtime-rust/${{ matrix.os }}/iceberg-${{ matrix.iceberg-version.full }}/spark-${{ matrix.spark-version.full }}/scala-${{ matrix.scala-version }}/java-${{ matrix.java-version }}
runs-on: ${{ matrix.os }}
container:
image: amd64/rust
env:
SPARK_LOCAL_IP: localhost
steps:
- uses: actions/checkout@v5
- name: Setup Rust & Java toolchain
uses: ./.github/actions/setup-builder
with:
rust-version: ${{env.RUST_VERSION}}
jdk-version: ${{ matrix.java-version }}
- name: Build Comet
shell: bash
run: |
PROFILES="-Pspark-${{matrix.spark-version.short}} -Pscala-${{matrix.scala-version}}" make release
- name: Setup Iceberg
uses: ./.github/actions/setup-iceberg-rust-builder
with:
iceberg-version: ${{ matrix.iceberg-version.full }}
- name: Run Iceberg Spark runtime tests (Rust)
run: |
cd apache-iceberg
rm -rf /root/.m2/repository/org/apache/parquet # somehow parquet cache requires cleanups
ENABLE_COMET=true ENABLE_COMET_ONHEAP=true ./gradlew -DsparkVersions=${{ matrix.spark-version.short }} -DscalaVersion=${{ matrix.scala-version }} -DflinkVersions= -DkafkaVersions= \
:iceberg-spark:iceberg-spark-runtime-${{ matrix.spark-version.short }}_${{ matrix.scala-version }}:integrationTest \
-Pquick=true -x javadoc
2 changes: 2 additions & 0 deletions .github/workflows/pr_build_linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ jobs:
value: |
org.apache.comet.CometFuzzTestSuite
org.apache.comet.CometFuzzAggregateSuite
org.apache.comet.CometFuzzIcebergSuite
org.apache.comet.CometFuzzMathSuite
org.apache.comet.DataGeneratorSuite
- name: "shuffle"
Expand All @@ -124,6 +125,7 @@ jobs:
org.apache.spark.sql.comet.ParquetDatetimeRebaseV2Suite
org.apache.spark.sql.comet.ParquetEncryptionITCase
org.apache.comet.exec.CometNativeReaderSuite
org.apache.comet.CometIcebergNativeSuite
- name: "exec"
value: |
org.apache.comet.exec.CometAggregateSuite
Expand Down
2 changes: 2 additions & 0 deletions .github/workflows/pr_build_macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ jobs:
value: |
org.apache.comet.CometFuzzTestSuite
org.apache.comet.CometFuzzAggregateSuite
org.apache.comet.CometFuzzIcebergSuite
org.apache.comet.CometFuzzMathSuite
org.apache.comet.DataGeneratorSuite
- name: "shuffle"
Expand All @@ -89,6 +90,7 @@ jobs:
org.apache.spark.sql.comet.ParquetDatetimeRebaseV2Suite
org.apache.spark.sql.comet.ParquetEncryptionITCase
org.apache.comet.exec.CometNativeReaderSuite
org.apache.comet.CometIcebergNativeSuite
- name: "exec"
value: |
org.apache.comet.exec.CometAggregateSuite
Expand Down
10 changes: 10 additions & 0 deletions common/src/main/scala/org/apache/comet/CometConf.scala
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,16 @@ object CometConf extends ShimCometConf {
Set(SCAN_NATIVE_COMET, SCAN_NATIVE_DATAFUSION, SCAN_NATIVE_ICEBERG_COMPAT, SCAN_AUTO))
.createWithEnvVarOrDefault("COMET_PARQUET_SCAN_IMPL", SCAN_AUTO)

val COMET_ICEBERG_NATIVE_ENABLED: ConfigEntry[Boolean] =
conf("spark.comet.scan.icebergNative.enabled")
.category(CATEGORY_SCAN)
.doc(
"Whether to enable native Iceberg table scan using iceberg-rust. When enabled, " +
"Iceberg tables are read directly through native execution, bypassing Spark's " +
"DataSource V2 API for better performance.")
.booleanConf
.createWithDefault(false)

val COMET_RESPECT_PARQUET_FILTER_PUSHDOWN: ConfigEntry[Boolean] =
conf("spark.comet.parquet.respectFilterPushdown")
.category(CATEGORY_PARQUET)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ object NativeConfig {
* consistent and standardized cloud storage support across all providers.
*/
def extractObjectStoreOptions(hadoopConf: Configuration, uri: URI): Map[String, String] = {
val scheme = uri.getScheme.toLowerCase(Locale.ROOT)
val scheme = Option(uri.getScheme).map(_.toLowerCase(Locale.ROOT)).getOrElse("file")

import scala.jdk.CollectionConverters._
val options = scala.collection.mutable.Map[String, String]()
Expand Down
1 change: 1 addition & 0 deletions dev/ci/check-suites.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ def file_to_class_name(path: Path) -> str | None:
ignore_list = [
"org.apache.comet.parquet.ParquetReadSuite", # abstract
"org.apache.comet.parquet.ParquetReadFromS3Suite", # manual test suite
"org.apache.comet.IcebergReadFromS3Suite", # manual test suite
"org.apache.spark.sql.comet.CometPlanStabilitySuite", # abstract
"org.apache.spark.sql.comet.ParquetDatetimeRebaseSuite", # abstract
"org.apache.comet.exec.CometColumnarShuffleSuite" # abstract
Expand Down
8 changes: 4 additions & 4 deletions dev/diffs/3.4.3.diff
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
diff --git a/pom.xml b/pom.xml
index d3544881af1..9c174496a4b 100644
index d3544881af1..fbe1c4b9a87 100644
--- a/pom.xml
+++ b/pom.xml
@@ -148,6 +148,8 @@
Expand Down Expand Up @@ -513,7 +513,7 @@ index a6b295578d6..91acca4306f 100644

test("SPARK-35884: Explain Formatted") {
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
index 2796b1cf154..4816349d690 100644
index 2796b1cf154..52438178a0e 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
@@ -33,6 +33,7 @@ import org.apache.spark.sql.TestingUDT.{IntervalUDT, NullData, NullUDT}
Expand All @@ -536,15 +536,15 @@ index 2796b1cf154..4816349d690 100644

val fileScan = df.queryExecution.executedPlan collectFirst {
case BatchScanExec(_, f: FileScan, _, _, _, _, _, _, _) => f
+ case CometBatchScanExec(BatchScanExec(_, f: FileScan, _, _, _, _, _, _, _), _) => f
+ case CometBatchScanExec(BatchScanExec(_, f: FileScan, _, _, _, _, _, _, _), _, _) => f
}
assert(fileScan.nonEmpty)
assert(fileScan.get.partitionFilters.nonEmpty)
@@ -916,6 +919,7 @@ class FileBasedDataSourceSuite extends QueryTest

val fileScan = df.queryExecution.executedPlan collectFirst {
case BatchScanExec(_, f: FileScan, _, _, _, _, _, _, _) => f
+ case CometBatchScanExec(BatchScanExec(_, f: FileScan, _, _, _, _, _, _, _), _) => f
+ case CometBatchScanExec(BatchScanExec(_, f: FileScan, _, _, _, _, _, _, _), _, _) => f
}
assert(fileScan.nonEmpty)
assert(fileScan.get.partitionFilters.isEmpty)
Expand Down
30 changes: 15 additions & 15 deletions dev/diffs/3.5.7.diff
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
diff --git a/pom.xml b/pom.xml
index 68e2c422a24..d971894ffe6 100644
index a0e25ce4d8d..7db86212507 100644
--- a/pom.xml
+++ b/pom.xml
@@ -152,6 +152,8 @@
Expand Down Expand Up @@ -38,7 +38,7 @@ index 68e2c422a24..d971894ffe6 100644
</dependencyManagement>

diff --git a/sql/core/pom.xml b/sql/core/pom.xml
index f08b33575fc..424e0da32fd 100644
index e3d324c8edb..22342150522 100644
--- a/sql/core/pom.xml
+++ b/sql/core/pom.xml
@@ -77,6 +77,10 @@
Expand Down Expand Up @@ -216,7 +216,7 @@ index 0efe0877e9b..423d3b3d76d 100644
-- SELECT_HAVING
-- https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/select_having.sql
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
index 9815cb816c9..95b5f9992b0 100644
index e5494726695..00937f025c2 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
@@ -38,7 +38,7 @@ import org.apache.spark.sql.catalyst.util.DateTimeConstants
Expand All @@ -239,7 +239,7 @@ index 9815cb816c9..95b5f9992b0 100644

test("A cached table preserves the partitioning and ordering of its cached SparkPlan") {
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
index 5a8681aed97..da9d25e2eb4 100644
index 6f3090d8908..c08a60fb0c2 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
@@ -28,7 +28,7 @@ import org.apache.spark.sql.catalyst.plans.logical.Expand
Expand Down Expand Up @@ -336,7 +336,7 @@ index 7ee18df3756..d09f70e5d99 100644
assert(exchanges.size == 2)
}
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala
index 47a311c71d5..342e71cfdd4 100644
index a1d5d579338..c201d39cc78 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala
@@ -24,8 +24,9 @@ import org.apache.spark.sql.catalyst.expressions.{AttributeReference, Expression
Expand Down Expand Up @@ -482,7 +482,7 @@ index a206e97c353..fea1149b67d 100644

test("SPARK-35884: Explain Formatted") {
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
index 93275487f29..01e5c601763 100644
index 93275487f29..33b2e7ad3b1 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
@@ -23,6 +23,7 @@ import java.nio.file.{Files, StandardOpenOption}
Expand Down Expand Up @@ -522,15 +522,15 @@ index 93275487f29..01e5c601763 100644

val fileScan = df.queryExecution.executedPlan collectFirst {
case BatchScanExec(_, f: FileScan, _, _, _, _) => f
+ case CometBatchScanExec(BatchScanExec(_, f: FileScan, _, _, _, _), _) => f
+ case CometBatchScanExec(BatchScanExec(_, f: FileScan, _, _, _, _), _, _) => f
}
assert(fileScan.nonEmpty)
assert(fileScan.get.partitionFilters.nonEmpty)
@@ -1056,6 +1062,7 @@ class FileBasedDataSourceSuite extends QueryTest

val fileScan = df.queryExecution.executedPlan collectFirst {
case BatchScanExec(_, f: FileScan, _, _, _, _) => f
+ case CometBatchScanExec(BatchScanExec(_, f: FileScan, _, _, _, _), _) => f
+ case CometBatchScanExec(BatchScanExec(_, f: FileScan, _, _, _, _), _, _) => f
}
assert(fileScan.nonEmpty)
assert(fileScan.get.partitionFilters.isEmpty)
Expand Down Expand Up @@ -624,7 +624,7 @@ index 7af826583bd..3c3def1eb67 100644
assert(shuffleMergeJoins.size == 1)
}
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
index 4d256154c85..66a5473852d 100644
index 44c8cb92fc3..f098beeca26 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
@@ -31,7 +31,8 @@ import org.apache.spark.sql.catalyst.analysis.UnresolvedRelation
Expand Down Expand Up @@ -822,7 +822,7 @@ index 4d256154c85..66a5473852d 100644
checkAnswer(fullJoinDF, Row(100))
}
}
@@ -1583,6 +1612,9 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan
@@ -1611,6 +1640,9 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan
Seq(semiJoinDF, antiJoinDF).foreach { df =>
assert(collect(df.queryExecution.executedPlan) {
case j: ShuffledHashJoinExec if j.ignoreDuplicatedKey == ignoreDuplicatedKey => true
Expand All @@ -832,7 +832,7 @@ index 4d256154c85..66a5473852d 100644
}.size == 1)
}
}
@@ -1627,14 +1659,20 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan
@@ -1655,14 +1687,20 @@ class JoinSuite extends QueryTest with SharedSparkSession with AdaptiveSparkPlan

test("SPARK-43113: Full outer join with duplicate stream-side references in condition (SMJ)") {
def check(plan: SparkPlan): Unit = {
Expand All @@ -855,7 +855,7 @@ index 4d256154c85..66a5473852d 100644
}
dupStreamSideColTest("SHUFFLE_HASH", check)
}
@@ -1770,7 +1808,8 @@ class ThreadLeakInSortMergeJoinSuite
@@ -1798,7 +1836,8 @@ class ThreadLeakInSortMergeJoinSuite
sparkConf.set(SHUFFLE_SPILL_NUM_ELEMENTS_FORCE_SPILL_THRESHOLD, 20))
}

Expand All @@ -879,7 +879,7 @@ index c26757c9cff..d55775f09d7 100644
protected val baseResourcePath = {
// use the same way as `SQLQueryTestSuite` to get the resource path
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
index 793a0da6a86..181bfc16e4b 100644
index 3cf2bfd17ab..49728c35c42 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
@@ -1521,7 +1521,8 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
Expand Down Expand Up @@ -2050,10 +2050,10 @@ index 8e88049f51e..8f3cf8a0f80 100644
case _ =>
throw new AnalysisException("Can not match ParquetTable in the query.")
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala
index 4f8a9e39716..fb55ac7a955 100644
index 8ed9ef1630e..eed2a6f5ad5 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala
@@ -1335,7 +1335,8 @@ class ParquetIOSuite extends QueryTest with ParquetTest with SharedSparkSession
@@ -1345,7 +1345,8 @@ class ParquetIOSuite extends QueryTest with ParquetTest with SharedSparkSession
}
}

Expand Down
Loading
Loading