Releases: databrickslabs/lsql
Releases · databrickslabs/lsql
v0.7.0
- Added
databricks labs lsql fmtcommand (#221). The commit introduces a new command,databricks labs lsql fmt, to the open-source library, which formats SQL files in a given folder using the Databricks SDK. This command can be used without authentication and accepts afolderflag, which specifies the directory containing SQL files to format. The change also updates the labs.yml file and includes a new method,format, in theQueryTileclass, which formats SQL queries using thesqlglotlibrary. This commit enhances the functionality of the CLI for SQL file formatting and improves the readability and consistency of SQL files, making it easier for developers to understand and maintain the code. Additionally, the commit includes changes to various SQL files to demonstrate the improved formatting, such as converting SQL keywords to uppercase, adding appropriate spacing around keywords and operators, and aligning column names in theVALUESclause. The purpose of this change is to ensure that the formatting method works correctly and does not introduce any issues in the existing functionality.
Contributors: @nfx
v0.6.0
- Added method to dashboards to get dashboard url (#211). In this release, we have added a new method
get_urlto thelakeview_dashboardsobject in thelaksedashboardlibrary. This method utilizes the Databricks SDK to retrieve the dashboard URL, simplifying the code and making it more maintainable. Previously, the dashboard URL was constructed by concatenating the host and dashboard ID, but this new method ensures that the URL is obtained correctly, even if the format changes in the future. Additionally, a new unit test has been added for a method that gets the dashboard URL using the workspace client. This new functionality allows users to easily retrieve the URL for a dashboard using its ID and the workspace client. - Extend replace database in query (#210). This commit extends the database replacement functionality in the
DashboardMetadataclass, allowing users to specify which database and catalog to replace. The enhancement includes support for catalog replacement and a newreplace_databasemethod in theDashboardMetadataclass, which replaces the catalog and/or database in the query based on provided parameters. These changes enhance the flexibility and customization of the database replacement feature in queries, making it easier for users to control how their data is displayed in the dashboard. Thecreate_dashboardfunction has also been updated to use the new method for replacing the database and catalog. Additionally, theTileMetadataupdate method has been replaced with a new merge method, and theQueryTileandTileclasses have new properties and methods for handling content, width, height, and position. The commit also includes several unit tests to ensure the new functionality works as expected. - Improve object oriented dashboard-as-code implementation (#208). In this release, the object-oriented implementation of the dashboard-as-code feature has been significantly improved, addressing previous pull request comments (#201). The
TileMetadatadataclass now includes methods for updating and comparing tile metadata, and theDashboardMetadataclass has been removed and its functionality incorporated into theDashboardsclass. TheDashboardsclass now generates tiles, datasets, and layouts for dashboards using the providedquery_transformer. The code's readability and maintainability have been further enhanced by replacing the use of thecopymodule withdataclasses.replacefor creating object copies. Additionally, updates have been made to the unit tests for dashboard functionality in the project, with new methods and attributes added to check for valid dashboard metadata and handle duplicate query or widget IDs, as well as to specify the order in which tiles and widgets should be displayed in the dashboard.
Contributors: @JCZuurmond
v0.5.0
- Added Command Execution backend which uses Command Execution API on a cluster (#95). In this release, the databricks labs lSQL library has been updated with a new Command Execution backend that utilizes the Command Execution API. A new
CommandExecutionBackendclass has been implemented, which initializes aCommandExecutorinstance taking a cluster ID, workspace client, and language as parameters. Theexecutemethod runs SQL commands on the specified cluster, and thefetchmethod returns the query result as an iterator of Row objects. The existingStatementExecutionBackendclass has been updated to inherit from a new abstract base class calledExecutionBackend, which includes asave_tablemethod for saving data to tables and is meant to be a common base class for both Statement and Command Execution backends. TheStatementExecutionBackendclass has also been updated to use the newExecutionBackendabstract class and its constructor now accepts amax_records_per_batchparameter. Theexecuteandfetchmethods have been updated to use the new_only_n_bytesmethod for logging truncated SQL statements. Additionally, theCommandExecutionBackendclass has several methods,execute,fetch, andsave_tableto execute commands on a cluster and save the results to tables in the databricks workspace. This new backend is intended to be used for executing commands on a cluster and saving the results in a databricks workspace. - Added basic integration with Lakeview Dashboards (#66). In this release, we've added basic integration with Lakeview Dashboards to the project, enhancing its capabilities. This includes updating the
databricks-labs-blueprintdependency to version 0.4.2 with the[yaml]extra, allowing for additional functionality related to handling YAML files. A new file,dashboards.py, has been introduced, providing a class for interacting with Databricks dashboards, along with methods for retrieving and saving dashboard configurations. Additionally, a new__init__.pyfile under thesrc/databricks/labs/lsql/lakeviewdirectory imports all classes and functions from themodel.pymodule, providing a foundation for further development and customization. The release also introduces a new file,model.py, containing code generated from OpenAPI specs by the Databricks SDK Generator, and a template file,model.py.tmpl, used for handling JSON data during integration with Lakeview Dashboards. A new file,polymorphism.py, provides utilities for checking if a value can be assigned to a specific type, supporting correct data typing and formatting with Lakeview Dashboards. Furthermore, a.gitignorefile has been added to thetests/integrationdirectory as part of the initial steps in adding integration testing to ensure compatibility with the Lakeview Dashboards platform. Lastly, thetest_dashboards.pyfile in thetests/integrationdirectory contains a function,test_load_dashboard(ws), which uses theDashboardsclass to save a dashboard from a source to a destination path, facilitating testing during the integration process. - Added dashboard-as-code functionality (#201). This commit introduces dashboard-as-code functionality for the UCX project, enabling the creation and management of dashboards using code. The feature resolves multiple issues and includes a new
create-dashboardcommand for creating unpublished dashboards. The functionality is available in thelsqllab and allows for specifying the order and width of widgets, overriding default widget identifiers, and supporting various SQL and markdown header arguments. Thedashboard.ymlfile is used to define top-level metadata for the dashboard. This commit also includes extensive documentation and examples for using the dashboard as a library and configuring different options. - Automate opening integration test dashboard in debug mode (#167). A new feature has been added to automatically open the integration test dashboard in debug mode, making it easier for software engineers to debug and troubleshoot. This has been achieved by importing the
webbrowserandis_in_debugmodules from "databricks.labs.blueprint.entrypoint", and adding a check in thecreatefunction to determine if the code is running in debug mode. If it is, a dashboard URL is constructed from the workspace configuration and dashboard ID, and then opened in a web browser using "webbrowser.open". This allows for a more streamlined debugging process for the integration test dashboard. No other parts of the code have been affected by this change. - Automatically tile widgets (#109). In this release, we've introduced an automatic widget tiling feature for the dashboard creation process in our open-source library. The
Dashboardsclass now includes a new class variable,_maximum_dashboard_width, set to 6, representing the maximum width allowed for each row of widgets in the dashboard. Thecreate_dashboardmethod has been updated to accept a newselfparameter, turning it into an instance method. A new_get_positionmethod has been introduced to calculate and return the next available position for placing a widget, and a_get_width_and_heightmethod has been added to return the width and height for a widget specification, initially handlingCounterSpecinstances. Additionally, we've added new unit tests to improve testing coverage, ensuring that widgets are created, positioned, and sized correctly. These tests also cover the correct positioning of widgets based on their order and available space, as well as the expected width and height for each widget. - Bump actions/checkout from 4.1.3 to 4.1.6 (#102). In the latest release, the 'actions/checkout' GitHub Action has been updated from version 4.1.3 to 4.1.6, which includes checking the platform to set the archive extension appropriately. This release also bumps the version of github/codeql-action from 2 to 3, actions/setup-node from 1 to 4, and actions/upload-artifact from 2 to 4. Additionally, the minor-actions-dependencies group was updated with two new versions. Disabling extensions.worktreeConfig when disabling sparse-checkout was introduced in version 4.1.4. The release notes and changelog for this update can be found in the provided link. This commit was made by dependabot[bot] with contributions from cory-miller and jww3.
- Bump actions/checkout from 4.1.6 to 4.1.7 (#151). In the latest release, the 'actions/checkout' GitHub action has been updated from version 4.1.6 to 4.1.7 in the project's push workflow, which checks out the repository at the start of the workflow. This change brings potential bug fixes, performance improvements, or new features compared to the previous version. The update only affects the version number in the YAML configuration for the 'actions/checkout' step in the release.yml file, with no new methods or alterations to existing functionality. This update aims to ensure a smooth and enhanced user experience for those utilizing the project's push workflows by taking advantage of the possible improvements or bug fixes in the new version of 'actions/checkout'.
- Create a dashboard with a counter from a single query (#107). In this release, we have introduced several enhancements to our dashboard-as-code approach, including the creation of a
Dashboardsclass that provides methods for getting, saving, and deploying dashboards. A new method,create_dashboard, has been added to create a dashboard with a single page containing a counter widget. The counter widget is associated with a query that counts the number of rows in a specified dataset. Thedeploy_dashboardmethod has also been added to deploy the dashboard to the workspace. Additionally, we have implemented a new feature for creating dashboards with a counter from a single query, including modifications to thetest_dashboards.pyfile and the addition of four new tests. These changes improve the robustness of the dashboard creation process and provide a more automated way to view important metrics. - Create text widget from markdown file (#142). A new feature has been implemented in the library that allows for the creation of a text widget from a markdown file, enhancing customization and readability for users. This development resolves issue #1
- Design document for dashboards-as-code (#105). "The latest release introduces 'Dashboards as Code,' a method for defining and managing dashboards through configuration files, enabling version control and controlled changes. The building blocks include
.sql,.md, anddashboard.ymlfiles, with.sqldefining queries and determining tile order, anddashboard.ymlspecifying top-level metadata and tile overrides. Metadata can be inferred or explicitly defined in the query or files. The tile order can be determined by SQL file order,tilesorder indashboard.yml, or SQL file metadata. This project can also be used as a library for embedding dashboard generation in your code. Configuration precedence follows command-line flags, SQL file headers,dashboard.yml, and SQL query content. The command-line interface is utilized for dashboard generation from configuration files." - Ensure propagation of
lsqlversion intoUser-Agentheader when it is used as library (#206). In this release, thepyproject.tomlfile has be...
v0.4.3
- Bump actions/checkout from 4.1.2 to 4.1.3 (#97). The
actions/checkoutdependency has been updated from version 4.1.2 to 4.1.3 in theupdate-main-version.ymlfile. This new version includes a check to verify the git version before attempting to disablesparse-checkout, and adds an SSH user parameter to improve functionality and compatibility. The release notes and CHANGELOG.md file provide detailed information on the specific changes and improvements. The pull request also includes a detailed commit history and links to corresponding issues and pull requests on GitHub for transparency. You can review and merge the pull request to update theactions/checkoutdependency in your project. - Maintain PySpark compatibility for databricks.labs.lsql.core.Row (#99). In this release, we have added a new method
asDictto theRowclass in thedatabricks.labs.lsql.coremodule to maintain compatibility with PySpark. This method returns a dictionary representation of theRowobject, with keys corresponding to column names and values corresponding to the values in each column. Additionally, we have modified thefetchfunction in thebackends.pyfile to returnRowobjects ofpyspark.sqlwhen usingself._spark.sql(sql).collect(). This change is temporary and marked with aTODOcomment, indicating that it will be addressed in the future. We have also added error handling code in thefetchfunction to ensure the function operates as expected. TheasDictmethod in this implementation simply calls the existingas_dictmethod, meaning the behavior of theasDictmethod is identical to theas_dictmethod. Theas_dictmethod returns a dictionary representation of theRowobject, with keys corresponding to column names and values corresponding to the values in each column. The optionalrecursiveargument in theasDictmethod, when set toTrue, enables recursive conversion of nestedRowobjects to nested dictionaries. However, this behavior is not currently implemented, and therecursiveargument is alwaysFalseby default.
Dependency updates:
- Bump actions/checkout from 4.1.2 to 4.1.3 (#97).
Contributors: @dependabot[bot], @bishwajit-db
v0.4.2
- Added more
NotFounderror type (#94). In the latest update, thecore.pyfile in thedatabricks/labs/lsqlpackage has undergone enhancements to the error handling functionality. The_raise_if_neededfunction has been modified to raise aNotFounderror when the error message includes the phrase "does not exist". This update enables the system to categorize specific SQL query errors asNotFounderror messages, thereby improving the overall error handling and reporting capabilities. This change was a collaborative effort, as indicated by the co-authored-by statement in the commit.
Contributors: @nkvuong
v0.4.1
- Fixing ovewrite integration tests (#92). A new enhancement has been implemented for the
overwritefeature's integration tests, addressing a concern with write operations. Two new variables,catalogand "schema", have been incorporated using theenv_or_skipfunction. These variables are utilized in thesave_tablemethod, which is now invoked twice with the same table, once with theappendand once with theoverwriteoption. The data in the table is retrieved and checked for accuracy after each call, employing the updatedRowclass with revised field namesfirstand "second", formerlynameand "id". This modification ensures the proper operation of theoverwritefeature during integration tests and resolves any related issues. The commit messageFixing overwrite integration testssignifies this change.
Contributors: @william-conti
v0.4.0
- Added catalog and schema parameters to execute and fetch (#90). In this release, we have added optional
catalogandschemaparameters to theexecuteandfetchmethods in theSqlBackendabstract base class, allowing for more flexibility when executing SQL statements in specific catalogs and schemas. These updates include new method signatures and their respective implementations in theSparkSqlBackendandDatabricksSqlBackendclasses. The new parameters control the catalog and schema used by theSparkSessioninstance in theSparkSqlBackendclass and theSqlClientinstance in theDatabricksSqlBackendclass. This enhancement enables better functionality in multi-catalog and multi-schema environments. Additionally, this change comes with unit tests and integration tests to ensure proper functionality. The new parameters can be used when calling theexecuteandfetchmethods. For example, with aSparkSqlBackendinstancespark_backend, you can execute a SQL statement in a specific catalog and schema with the following code:spark_backend.execute("SELECT * FROM my_table", catalog="my_catalog", schema="my_schema"). Similarly, thefetchmethod can also be used with the new parameters.
Contributors: @FastLee
v0.3.1
- Check UCX and LSQL for backwards compatibility (#78). In this release, we introduce a new GitHub Actions workflow, downstreams.yml, which automates unit testing for downstream projects upon changes made to the upstream project. The workflow runs on pull requests, merge groups, and pushes to the main branch and sets permissions for id-token, contents, and pull-requests. It includes a compatibility job that runs on Ubuntu, checks out the code, sets up Python, installs the toolchain, and accepts downstream projects using the databrickslabs/sandbox/downstreams action. The job matrix includes two downstream projects, ucx and remorph, and uses the build cache to speed up the pip install step. This feature ensures that changes to the upstream project do not break compatibility with downstream projects, maintaining a stable and reliable library for software engineers.
- Fixed
Builderobject has no attributesdk_configerror (#86). In this release, we've resolved aBuilderobject has no attributesdk_configerror that occurred when initializing a Spark session using theDatabricksSession.buildermethod. The issue was caused by using dot notation to access thesdk_configattribute, which is incorrect. This has been updated to the correct syntax ofsdkConfig. This change enables successful creation of the Spark session, preventing the error from recurring. TheDatabricksSessionclass and its methods, such asgetOrCreate, continue to be used for interacting with Databricks clusters and workspaces, while theWorkspaceClientclass manages Databricks resources within a workspace.
Dependency updates:
- Bump codecov/codecov-action from 1 to 4 (#84).
- Bump actions/setup-python from 4 to 5 (#83).
- Bump actions/checkout from 2.5.0 to 4.1.2 (#81).
- Bump softprops/action-gh-release from 1 to 2 (#80).
Contributors: @dependabot[bot], @nfx, @bishwajit-db, @william-conti
v0.3.0
- Added support for
save_table(..., mode="overwrite")toStatementExecutionBackend(#74). In this release, we've added support for overwriting a table when saving data using thesave_tablemethod in theStatementExecutionBackend. Previously, attempting to use theoverwritemode would raise aNotImplementedError. Now, when this mode is specified, the method first truncates the table before inserting the new rows. The truncation is done using theexecutemethod to run aTRUNCATE TABLESQL command. Additionally, we've added a new integration test,test_overwrite, to thetest_deployment.pyfile to verify the newoverwritemode functionality. A new option,mode="overwrite", has been added to thesave_tablemethod, allowing for the existing data in the table to be deleted and replaced with the new data being written. We've also added two new test cases,test_statement_execution_backend_save_table_overwrite_empty_tableandtest_mock_backend_overwrite, to verify the new functionality. It's important to note that the method signature has been updated to include a default value for themodeparameter, setting it toappendby default. This change does not affect the functionality and only provides a more convenient default behavior for users of the method.
Contributors: @william-conti
v0.2.5
- Fixed PyPI badge (#72). In this release, we have implemented a fix to the PyPI badge in the README file of our open-source library. The PyPI badge displays the version of the package and serves as a quick reference for users. This fix ensures the accuracy and proper functioning of the badge, without involving any changes to the functionality or methods within the project. Software engineers can be assured that this update is limited to the README file, specifically the PyPI badge, and will not affect the overall functionality of the library.
- Fixed
no-cheatcheck (#71). In this release, we have made improvements to theno-cheatverification process for new code. Previously, the check for disabling the linter was prone to false positives when the string '# pylint: disable' appeared for reasons other than disabling the linter. The updated code now includes an additional filter to exclude the stringCHEATfrom the search, and the number of characters in the output is counted using thewc -ccommand. If the count is not zero, the script will terminate with an error message. This change enhances the accuracy of theno-cheatcheck, ensuring that the linter is being used correctly and that all new code meets our quality standards. - Removed upper bound on
sqlglotdependency (#70). In this update, we have removed the upper bound on thesqlglotdependency version in the project'spyproject.tomlfile. Previously, the version constraint requiredsqlglotto be at least 22.3.1 but less than 22.5.0. With this modification, there will be no upper limit, enabling the project to utilize any version greater than or equal to 22.3.1. This change provides the project with the flexibility to take advantage of future bug fixes, performance improvements, and new features available in newersqlglotpackage versions. Developers should thoroughly test the updated package version to ensure compatibility with the existing codebase.
Contributors: @nfx