A Databricks-native web application for accelerating SQL migration and schema reconciliation from legacy systems (Snowflake, T-SQL, Redshift, Oracle, Teradata, MySQL, PostgreSQL, SSIS, Informatica, etc.) into Databricks SQL.
The app leverages LLMs (Claude, Llama, GPT, etc.) for query conversion, validation, and automated fixes.
- Convert individual queries in real time.
- Choose LLM model, SQL Warehouse, and Source Dialect.
- Add custom prompt instructions to handle tricky translations.
- Validate queries by running
EXPLAINin Databricks. - Retry mechanism: if validation fails, re-submit failed queries with error context so the LLM can correct its own mistakes.
- Bulk convert entire folders of SQL files.
- Configure source dialect, input folder, output notebook folder, and results table.
- Choose validation strategy:
- No validation
- Validate by running
EXPLAIN
- Failed queries can be retried with error feedback.
- Results are persisted in Delta for easy querying and history.
- Compare source vs target schemas (
catalog.schemaformat). - Run reconciliation jobs that:
- Count rows in source and target.
- Highlight mismatches.
- Results stored in a Delta table for auditing.
- Useful for validating post-migration data consistency.
Clone this repository into a Databricks Git-enabled workspace folder.
In the repo root there is a notebook called app_deployer.
Open it in Databricks and run all cells. This will automatically:
- Install dependencies
- Deploy the Streamlit app into your workspace
- Make the app available for immediate use
- LLM Models: Select from Claude, GPT, Llama, Gemma, etc.
- Custom Prompts: (OPTIONAL) Add dialect-specific hints.
- Validation: Toggle validation strategy to balance speed vs correctness.
- Results Storage: Batch and reconcile results are persisted in Delta tables.
Both Interactive and Batch modes have built-in retry support:
- If validation fails, the app automatically captures the validation error message.
- The failed query plus the error context are re-submitted to the LLM.
- The LLM adjusts its output and attempts to generate a corrected query.
- This iterative approach significantly increases the chance of success.
├── app.py # Main Streamlit app (3 tabs: Interactive, Batch, Reconcile)
├── app_deployer.py # Notebook for auto-deployment in Databricks
├── requirements.txt # Python dependencies
├── src
│ ├── noteboks/ # Notebooks to be run as jobs for batch modes
│ ├── utils/ # Helper modules (prompt handling, model mapping, etc.)
│ └── resources/ # YAML files with LLM prompts
└── README.md
- Use Interactive Conversion to test a few sample queries from Snowflake.
- Run a Batch Job to convert hundreds of SQL files into Databricks notebooks.
- Execute the Reconcile Tables job to ensure source and target schemas match in counts and data samples.
- git clone this project locally
- Utilize the Databricks CLI to test your changes against a Databricks workspace of your choice
- Contribute to repositories with pull requests (PRs), ensuring that you always have a second-party review from a capable teammate
© 2025 Databricks, Inc. All rights reserved. The source in this project is provided subject to the Databricks License [https://databricks.com/db-license-source]. All included or referenced third party libraries are subject to the licenses set forth below.
| Package | License | Copyright |
|---|---|---|
| streamlit | Apache-2.0 | Copyright (c) Streamlit / Snowflake Inc. |
| PyYAML | MIT | Copyright (c) 2006–2016 Kirill Simonov; 2017–2019 Ingy döt Net; YAML community |
| pandas>=1.5.0 | BSD-3-Clause | Copyright (c) 2008-2023, AQR Capital Management, LLC |
| databricks-sdk>=0.61.0 | Apache-2.0 | Copyright (c), Databricks, Inc. |
This project is licensed under the Databricks License - see the LICENSE file for details.
Please note the code in this project is provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.



