-
Notifications
You must be signed in to change notification settings - Fork 249
Refactor schema and db prefixes #2504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the extraction of database and schema names from dataset prefixes to centralize the logic and support overriding in data source implementations. The key change is introducing two new methods (extract_database_from_prefix and extract_schema_from_prefix) in DataSourceImpl that can be overridden by specific data source types like Dremio.
- Introduced
extract_database_from_prefix()andextract_schema_from_prefix()methods inDataSourceImplto centralize prefix extraction logic - Replaced inline prefix extraction logic throughout the codebase with calls to the new centralized methods
- Updated test helpers to use the new extraction methods for consistency
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| soda-core/src/soda_core/common/data_source_impl.py | Added extract_schema_from_prefix() and extract_database_from_prefix() methods; refactored existing methods to use these new centralized extraction methods instead of inline index-based access |
| soda-tests/src/helpers/data_source_test_helper.py | Added wrapper methods for schema/database extraction and updated test table querying logic to use the new centralized extraction methods |
| soda-sqlserver/src/soda_sqlserver/test_helpers/sqlserver_data_source_test_helper.py | Updated schema dropping logic to use extract_schema_from_prefix() instead of direct prefix index access |
| soda-bigquery/src/soda_bigquery/common/data_sources/bigquery_data_source.py | Refactored to use extract_schema_from_prefix() in the table namespace building method |
| soda-athena/src/soda_athena/test_helpers/athena_data_source_test_helper.py | Updated S3 schema directory construction to use extract_schema_from_prefix() |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| database_index: int | None = self.sql_dialect.get_database_prefix_index() | ||
| schema_index: int | None = self.sql_dialect.get_schema_prefix_index() | ||
|
|
||
| schema_name: str = prefixes[schema_index] if schema_index is not None and schema_index < len(prefixes) else None | ||
| if schema_name is None: | ||
| raise ValueError(f"Cannot determine schema name from prefixes: {prefixes}") | ||
|
|
||
| database_name: str | None = ( | ||
| prefixes[database_index] if database_index is not None and database_index < len(prefixes) else None | ||
| ) | ||
|
|
||
| table_namespace: DataSourceNamespace = ( | ||
| SchemaDataSourceNamespace(schema=prefixes[schema_index]) | ||
| SchemaDataSourceNamespace(schema=schema_name) | ||
| if database_index is None |
Copilot
AI
Dec 1, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable database_index is retrieved but never used in this method. After the refactoring, the method now uses database_name directly (extracted via extract_database_from_prefix) to determine whether to create a SchemaDataSourceNamespace or DbSchemaDataSourceNamespace. The check on line 208 should use database_name is None instead of database_index is None. Consider removing the unused variable and fixing the conditional check.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
for more information, see https://pre-commit.ci
mivds
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice refactor, thanks for taking the time! 👍
…odadata/soda-core into dtl-1385-split-dremio-schemas-by-slash
|



Companion PR: https://github.com/sodadata/soda-extensions/pull/190