Skip to content

Conversation

@paulteehan
Copy link

@paulteehan paulteehan commented Nov 28, 2025

  • Refactor and centralize extracting database and schema names from prefixes, to support overriding
  • Override Dremio so DQNs are written as dremio/foo/bar/table, where schema="foo.bar"

Companion PR: https://github.com/sodadata/soda-extensions/pull/190

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the extraction of database and schema names from dataset prefixes to centralize the logic and support overriding in data source implementations. The key change is introducing two new methods (extract_database_from_prefix and extract_schema_from_prefix) in DataSourceImpl that can be overridden by specific data source types like Dremio.

  • Introduced extract_database_from_prefix() and extract_schema_from_prefix() methods in DataSourceImpl to centralize prefix extraction logic
  • Replaced inline prefix extraction logic throughout the codebase with calls to the new centralized methods
  • Updated test helpers to use the new extraction methods for consistency

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
soda-core/src/soda_core/common/data_source_impl.py Added extract_schema_from_prefix() and extract_database_from_prefix() methods; refactored existing methods to use these new centralized extraction methods instead of inline index-based access
soda-tests/src/helpers/data_source_test_helper.py Added wrapper methods for schema/database extraction and updated test table querying logic to use the new centralized extraction methods
soda-sqlserver/src/soda_sqlserver/test_helpers/sqlserver_data_source_test_helper.py Updated schema dropping logic to use extract_schema_from_prefix() instead of direct prefix index access
soda-bigquery/src/soda_bigquery/common/data_sources/bigquery_data_source.py Refactored to use extract_schema_from_prefix() in the table namespace building method
soda-athena/src/soda_athena/test_helpers/athena_data_source_test_helper.py Updated S3 schema directory construction to use extract_schema_from_prefix()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 202 to 208
database_index: int | None = self.sql_dialect.get_database_prefix_index()
schema_index: int | None = self.sql_dialect.get_schema_prefix_index()

schema_name: str = prefixes[schema_index] if schema_index is not None and schema_index < len(prefixes) else None
if schema_name is None:
raise ValueError(f"Cannot determine schema name from prefixes: {prefixes}")

database_name: str | None = (
prefixes[database_index] if database_index is not None and database_index < len(prefixes) else None
)

table_namespace: DataSourceNamespace = (
SchemaDataSourceNamespace(schema=prefixes[schema_index])
SchemaDataSourceNamespace(schema=schema_name)
if database_index is None
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable database_index is retrieved but never used in this method. After the refactoring, the method now uses database_name directly (extracted via extract_database_from_prefix) to determine whether to create a SchemaDataSourceNamespace or DbSchemaDataSourceNamespace. The check on line 208 should use database_name is None instead of database_index is None. Consider removing the unused variable and fixing the conditional check.

Copilot uses AI. Check for mistakes.
paulteehan and others added 9 commits December 1, 2025 15:46
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

@mivds mivds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor, thanks for taking the time! 👍

…odadata/soda-core into dtl-1385-split-dremio-schemas-by-slash
@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 1, 2025

@paulteehan paulteehan marked this pull request as ready for review December 1, 2025 17:11
@paulteehan paulteehan merged commit 2a47022 into v4 Dec 1, 2025
30 checks passed
@paulteehan paulteehan deleted the dtl-1385-split-dremio-schemas-by-slash branch December 1, 2025 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants