-
Notifications
You must be signed in to change notification settings - Fork 129
feat: Add schema diff data structures and foundation (1/5) #1477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Introduces core data structures for schema diffing: - SchemaDiff, FieldChange, FieldUpdate types - FieldChangeType enum for classifying changes - SchemaDiffError for validation errors - ColumnName::parent() helper method This is part 1/5 of the schema diffing feature implementation. The actual diffing algorithm will be added in PR 2. Note: This PR includes a temporary stub for compute_schema_diff() to allow basic tests to compile. The full implementation from the original PR delta-io#1346 will be copied exactly in PR 2. Related to delta-io#1346
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1477 +/- ##
==========================================
+ Coverage 84.84% 84.89% +0.05%
==========================================
Files 120 121 +1
Lines 32103 32262 +159
Branches 32103 32262 +159
==========================================
+ Hits 27238 27390 +152
- Misses 3542 3547 +5
- Partials 1323 1325 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
9df62ad to
4dd5d99
Compare
Adds a comprehensive unit test that exercises the filtering logic by manually constructing a SchemaDiff with both top-level and nested field changes. This test verifies that the methods correctly filter fields by path depth (length 1 vs length > 1). The test improves code coverage for these methods from 0% to full coverage of the filtering logic, addressing CI coverage requirements.
4dd5d99 to
31cee42
Compare
zachschuermann
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few comments but generally looks good!
| /// Feature gate for schema diff functionality. | ||
| /// Set to `false` to ensure incomplete implementations don't activate until all PRs are merged. | ||
| /// Will be removed in the final PR when all tests and implementation are complete. | ||
| const SCHEMA_DIFF_ENABLED: bool = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm. we could just do a feature flag if we are really concerned about this? prefer feature flag over a constant I think
| /// Arguments for computing a schema diff | ||
| #[derive(Debug, Clone)] | ||
| pub(crate) struct SchemaDiffArgs<'a> { | ||
| /// The before/original schema | ||
| pub before: &'a StructType, | ||
| /// The after/new schema to compare against | ||
| pub after: &'a StructType, | ||
| } | ||
|
|
||
| impl<'a> SchemaDiffArgs<'a> { | ||
| /// Compute the difference between the two schemas | ||
| pub(crate) fn compute_diff(self) -> Result<SchemaDiff, SchemaDiffError> { | ||
| compute_schema_diff(self.before, self.after) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thoughts on just leaving these as parameters? we could just have something like SchemaDiff::new(before, after)?
| /// Fields that were modified between schemas | ||
| pub updated_fields: Vec<FieldUpdate>, | ||
| /// Whether the diff contains breaking changes (computed once during construction) | ||
| has_breaking_changes: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe more concise?
| has_breaking_changes: bool, | |
| breaking_changes: bool, |
| #[derive(Debug, Clone)] | ||
| pub(crate) struct SchemaDiffArgs<'a> { | ||
| /// The before/original schema | ||
| pub before: &'a StructType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how others feel but generally for pub(crate) fields i like the clarity of making each one pub(crate) instead of pub (which is the same as pub(crate) as long as the struct itself is pub(crate)`)
| // Feature gate check - prevents activation of incomplete implementation in production | ||
| // This gate will be removed in the final PR when all functionality is complete | ||
| // Note: Gate is bypassed in test builds to allow tests to validate the implementation | ||
| #[cfg(not(test))] | ||
| { | ||
| if !SCHEMA_DIFF_ENABLED { | ||
| return Ok(SchemaDiff { | ||
| added_fields: Vec::new(), | ||
| removed_fields: Vec::new(), | ||
| updated_fields: Vec::new(), | ||
| has_breaking_changes: false, | ||
| }); | ||
| } | ||
| } | ||
|
|
||
| // Stub implementation - actual implementation will be added in PR 2 | ||
| Ok(SchemaDiff { | ||
| added_fields: Vec::new(), | ||
| removed_fields: Vec::new(), | ||
| updated_fields: Vec::new(), | ||
| has_breaking_changes: false, | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can either make this todo!() or just always return Err::unsupported
| } | ||
|
|
||
| /// Get all changes at the top level only (fields with path length of 1) | ||
| pub(crate) fn top_level_changes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will we not want something that does both top-level and nested?
This is part 1 of 5 PRs that implement schema diffing for Delta Kernel Rust.
What's in this PR
SchemaDiff,FieldChange,FieldUpdateFieldChangeTypeenum for classifying changesSchemaDiffErrorfor validation errorsColumnName::parent()helper methodWhat's NOT in this PR
Feature Gating
This PR includes
const SCHEMA_DIFF_ENABLED: bool = falsewith a#[cfg(not(test))]guard.Part of #1346