Skip to content

Conversation

@vb-dbrks
Copy link
Contributor

@vb-dbrks vb-dbrks commented Nov 24, 2025

PR summary:

  1. Made sql_query support merge_columns be optional/empty and documenting/demoing the two modes (row-level with merge_columns vs. dataset-level). Added notebooks/docs/tests to prove the new behaviour, including row-filters, metrics observers, ref DF support, negation, and performance comparisons.

  2. Follow-up polish: centralised the dataset-level logic into _apply_dataset_level_sql_check, reused the new path in integration tests, and tightened unit expectations so helpers (like DQDatasetRule) stay type-stable.

  3. Hardening + helper cleanup: reintroduced merge-column validation to fail fast on bad inputs, forced dataset-level queries to error if they emit more than one row, and added the shared build_quality_violation helper so all integration expectations reuse the same metadata factory. Also refreshed formatting/linting.

TL;DR: sql_query now covers both dataset-level and row-level scenarios with clear validation and deterministic outputs, plus comprehensive tests and docs to back it up.

Changes

Linked issues

Resolves #938

Tests

  • manually tested
  • added unit tests
  • added integration tests
  • added end-to-end tests
  • added performance tests

Now supports two modes:
- Row-level validation (with merge_columns): results joined back to specific rows
- Dataset-level validation (without merge_columns): all rows get same result

This makes aggregate validations potentialy much faster and works with
custom_metrics. Empty list is treated as None.

backward compatible.

Includes tests, docs, and demo updates.
@vb-dbrks vb-dbrks self-assigned this Nov 24, 2025
@vb-dbrks vb-dbrks added the enhancement New feature or request label Nov 24, 2025
@vb-dbrks vb-dbrks marked this pull request as ready for review November 25, 2025 19:20
@vb-dbrks vb-dbrks requested a review from a team as a code owner November 25, 2025 19:20
@vb-dbrks vb-dbrks requested review from tombonfert and removed request for a team November 25, 2025 19:20
@vb-dbrks vb-dbrks requested review from mwojtyczka and removed request for tombonfert November 25, 2025 19:20
## Changes
<!-- Summary of your changes that are easy to understand. Add
screenshots when necessary -->
This PR adds `--install-folder` arguments for the DQX CLI commands. This
allows users to run the CLI commands when DQX is installed in a custom
folder.

When users install DQX in a custom folder, they should add arguments to
run DQX CLI commands:

```
databricks labs dqx open-dashboards --install-folder "/Workspace/..."
```

### Linked issues
<!-- DOC: Link issue with a keyword: close, closes, closed, fix, fixes,
fixed, resolve, resolves, resolved. See
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword
-->

Resolves #671 

### Tests
<!-- How is this tested? Please see the checklist below and also
describe any other relevant tests -->

- [x] manually tested
- [ ] added unit tests
- [x] added integration tests
- [ ] added end-to-end tests
- [ ] added performance tests
@github-actions
Copy link

github-actions bot commented Nov 25, 2025

✅ 480/480 passed, 2 flaky, 41 skipped, 3h31m45s total

Flaky tests:

  • 🤪 test_quality_checker_workflow_with_custom_check_func (3m7.936s)
  • 🤪 test_e2e_workflow_serverless (9m8.742s)

Running from acceptance #3360

…mn-optional-when-running-a-datasetmulti-dataset-level-dq-rules
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@mwojtyczka mwojtyczka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

mwojtyczka and others added 2 commits December 7, 2025 21:03
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…e-merge-column-optional-when-running-a-datasetmulti-dataset-level-dq-rules' into 938-feature-sql_query-should-have-merge-column-optional-when-running-a-datasetmulti-dataset-level-dq-rules
@mwojtyczka mwojtyczka changed the title feat: Make merge_columns optional in sql_query check Making merge columns optional in sql query check Dec 7, 2025
@mwojtyczka mwojtyczka changed the title Making merge columns optional in sql query check Made merge columns optional in sql query check Dec 7, 2025
@mwojtyczka mwojtyczka force-pushed the 938-feature-sql_query-should-have-merge-column-optional-when-running-a-datasetmulti-dataset-level-dq-rules branch from b7e574e to e50db33 Compare December 7, 2025 21:30
@mwojtyczka mwojtyczka merged commit 76aed0b into main Dec 8, 2025
16 checks passed
@mwojtyczka mwojtyczka deleted the 938-feature-sql_query-should-have-merge-column-optional-when-running-a-datasetmulti-dataset-level-dq-rules branch December 8, 2025 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: sql_query should have merge column optional when running a dataset/multi-dataset level DQ rules

4 participants