You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Rationale for this change
Closes#2131
The PR relaxes the constraint that prevented adding any file with field
IDs, and replaces it with a constraint that prevents adding files which
contain field IDs that are inconsistent with the field IDs of the table.
If the field IDs are compatible, then they can be added safely, if not,
they will be rejected.
## Are these changes tested?
Yes
## Are there any user-facing changes?
Yes
Copy file name to clipboardExpand all lines: mkdocs/docs/api.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1006,8 +1006,10 @@ Expert Iceberg users may choose to commit existing parquet files to the Iceberg
1006
1006
1007
1007
<!-- prettier-ignore-start -->
1008
1008
1009
-
!!! note "Name Mapping"
1010
-
Because `add_files` uses existing files without writing new parquet files that are aware of the Iceberg's schema, it requires the Iceberg's table to have a [Name Mapping](https://iceberg.apache.org/spec/?h=name+mapping#name-mapping-serialization) (The Name mapping maps the field names within the parquet files to the Iceberg field IDs). Hence, `add_files` requires that there are no field IDs in the parquet file's metadata, and creates a new Name Mapping based on the table's current schema if the table doesn't already have one.
1009
+
!!! note "Name Mapping and Field IDs"
1010
+
`add_files` can work with Parquet files both with and without field IDs in their metadata:
1011
+
- **Files with field IDs**: When field IDs are present in the Parquet metadata, they must match the corresponding field IDs in the Iceberg table schema. This is common for files generated by tools like Spark or when using or other libraries with explicit field ID metadata.
1012
+
- **Files without field IDs**: When field IDs are absent, the table must have a [Name Mapping](https://iceberg.apache.org/spec/?h=name+mapping#name-mapping-serialization) to map field names to Iceberg field IDs. `add_files` will automatically create a Name Mapping based on the table's current schema if one doesn't already exist.
1011
1013
1012
1014
!!! note "Partitions"
1013
1015
`add_files`only requires the client to read the existing parquet files' metadata footer to infer the partition value of each file. This implementation also supports adding files to Iceberg tables with partition transforms like `MonthTransform`, and `TruncateTransform` which preserve the order of the values after the transformation (Any Transform that has the `preserves_order` property set to True is supported). Please note that if the column statistics of the `PartitionField`'s source column are not present in the parquet metadata, the partition value is inferred as `None`.
0 commit comments