-
Notifications
You must be signed in to change notification settings - Fork 74
Made merge columns optional in sql query check #945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Made merge columns optional in sql query check #945
Conversation
Now supports two modes: - Row-level validation (with merge_columns): results joined back to specific rows - Dataset-level validation (without merge_columns): all rows get same result This makes aggregate validations potentialy much faster and works with custom_metrics. Empty list is treated as None. backward compatible. Includes tests, docs, and demo updates.
…d multi-dataset integration test.
## Changes <!-- Summary of your changes that are easy to understand. Add screenshots when necessary --> This PR adds `--install-folder` arguments for the DQX CLI commands. This allows users to run the CLI commands when DQX is installed in a custom folder. When users install DQX in a custom folder, they should add arguments to run DQX CLI commands: ``` databricks labs dqx open-dashboards --install-folder "/Workspace/..." ``` ### Linked issues <!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved. See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword --> Resolves #671 ### Tests <!-- How is this tested? Please see the checklist below and also describe any other relevant tests --> - [x] manually tested - [ ] added unit tests - [x] added integration tests - [ ] added end-to-end tests - [ ] added performance tests
|
✅ 480/480 passed, 2 flaky, 41 skipped, 3h31m45s total Flaky tests:
Running from acceptance #3360 |
…mn-optional-when-running-a-datasetmulti-dataset-level-dq-rules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
mwojtyczka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…e-merge-column-optional-when-running-a-datasetmulti-dataset-level-dq-rules' into 938-feature-sql_query-should-have-merge-column-optional-when-running-a-datasetmulti-dataset-level-dq-rules
b7e574e to
e50db33
Compare
PR summary:
Made sql_query support merge_columns be optional/empty and documenting/demoing the two modes (row-level with merge_columns vs. dataset-level). Added notebooks/docs/tests to prove the new behaviour, including row-filters, metrics observers, ref DF support, negation, and performance comparisons.
Follow-up polish: centralised the dataset-level logic into _apply_dataset_level_sql_check, reused the new path in integration tests, and tightened unit expectations so helpers (like DQDatasetRule) stay type-stable.
Hardening + helper cleanup: reintroduced merge-column validation to fail fast on bad inputs, forced dataset-level queries to error if they emit more than one row, and added the shared build_quality_violation helper so all integration expectations reuse the same metadata factory. Also refreshed formatting/linting.
TL;DR: sql_query now covers both dataset-level and row-level scenarios with clear validation and deterministic outputs, plus comprehensive tests and docs to back it up.
Changes
Linked issues
Resolves #938
Tests