Skip to content

Releases: databrickslabs/lsql

v0.7.0

15 Jul 18:40
@nfx nfx
9ce8b35

Choose a tag to compare

  • Added databricks labs lsql fmt command (#221). The commit introduces a new command, databricks labs lsql fmt, to the open-source library, which formats SQL files in a given folder using the Databricks SDK. This command can be used without authentication and accepts a folder flag, which specifies the directory containing SQL files to format. The change also updates the labs.yml file and includes a new method, format, in the QueryTile class, which formats SQL queries using the sqlglot library. This commit enhances the functionality of the CLI for SQL file formatting and improves the readability and consistency of SQL files, making it easier for developers to understand and maintain the code. Additionally, the commit includes changes to various SQL files to demonstrate the improved formatting, such as converting SQL keywords to uppercase, adding appropriate spacing around keywords and operators, and aligning column names in the VALUES clause. The purpose of this change is to ensure that the formatting method works correctly and does not introduce any issues in the existing functionality.

Contributors: @nfx

v0.6.0

11 Jul 10:28
@nfx nfx
2c062b2

Choose a tag to compare

  • Added method to dashboards to get dashboard url (#211). In this release, we have added a new method get_url to the lakeview_dashboards object in the laksedashboard library. This method utilizes the Databricks SDK to retrieve the dashboard URL, simplifying the code and making it more maintainable. Previously, the dashboard URL was constructed by concatenating the host and dashboard ID, but this new method ensures that the URL is obtained correctly, even if the format changes in the future. Additionally, a new unit test has been added for a method that gets the dashboard URL using the workspace client. This new functionality allows users to easily retrieve the URL for a dashboard using its ID and the workspace client.
  • Extend replace database in query (#210). This commit extends the database replacement functionality in the DashboardMetadata class, allowing users to specify which database and catalog to replace. The enhancement includes support for catalog replacement and a new replace_database method in the DashboardMetadata class, which replaces the catalog and/or database in the query based on provided parameters. These changes enhance the flexibility and customization of the database replacement feature in queries, making it easier for users to control how their data is displayed in the dashboard. The create_dashboard function has also been updated to use the new method for replacing the database and catalog. Additionally, the TileMetadata update method has been replaced with a new merge method, and the QueryTile and Tile classes have new properties and methods for handling content, width, height, and position. The commit also includes several unit tests to ensure the new functionality works as expected.
  • Improve object oriented dashboard-as-code implementation (#208). In this release, the object-oriented implementation of the dashboard-as-code feature has been significantly improved, addressing previous pull request comments (#201). The TileMetadata dataclass now includes methods for updating and comparing tile metadata, and the DashboardMetadata class has been removed and its functionality incorporated into the Dashboards class. The Dashboards class now generates tiles, datasets, and layouts for dashboards using the provided query_transformer. The code's readability and maintainability have been further enhanced by replacing the use of the copy module with dataclasses.replace for creating object copies. Additionally, updates have been made to the unit tests for dashboard functionality in the project, with new methods and attributes added to check for valid dashboard metadata and handle duplicate query or widget IDs, as well as to specify the order in which tiles and widgets should be displayed in the dashboard.

Contributors: @JCZuurmond

v0.5.0

03 Jul 11:02
@nfx nfx
619ff0a

Choose a tag to compare

  • Added Command Execution backend which uses Command Execution API on a cluster (#95). In this release, the databricks labs lSQL library has been updated with a new Command Execution backend that utilizes the Command Execution API. A new CommandExecutionBackend class has been implemented, which initializes a CommandExecutor instance taking a cluster ID, workspace client, and language as parameters. The execute method runs SQL commands on the specified cluster, and the fetch method returns the query result as an iterator of Row objects. The existing StatementExecutionBackend class has been updated to inherit from a new abstract base class called ExecutionBackend, which includes a save_table method for saving data to tables and is meant to be a common base class for both Statement and Command Execution backends. The StatementExecutionBackend class has also been updated to use the new ExecutionBackend abstract class and its constructor now accepts a max_records_per_batch parameter. The execute and fetch methods have been updated to use the new _only_n_bytes method for logging truncated SQL statements. Additionally, the CommandExecutionBackend class has several methods, execute, fetch, and save_table to execute commands on a cluster and save the results to tables in the databricks workspace. This new backend is intended to be used for executing commands on a cluster and saving the results in a databricks workspace.
  • Added basic integration with Lakeview Dashboards (#66). In this release, we've added basic integration with Lakeview Dashboards to the project, enhancing its capabilities. This includes updating the databricks-labs-blueprint dependency to version 0.4.2 with the [yaml] extra, allowing for additional functionality related to handling YAML files. A new file, dashboards.py, has been introduced, providing a class for interacting with Databricks dashboards, along with methods for retrieving and saving dashboard configurations. Additionally, a new __init__.py file under the src/databricks/labs/lsql/lakeview directory imports all classes and functions from the model.py module, providing a foundation for further development and customization. The release also introduces a new file, model.py, containing code generated from OpenAPI specs by the Databricks SDK Generator, and a template file, model.py.tmpl, used for handling JSON data during integration with Lakeview Dashboards. A new file, polymorphism.py, provides utilities for checking if a value can be assigned to a specific type, supporting correct data typing and formatting with Lakeview Dashboards. Furthermore, a .gitignore file has been added to the tests/integration directory as part of the initial steps in adding integration testing to ensure compatibility with the Lakeview Dashboards platform. Lastly, the test_dashboards.py file in the tests/integration directory contains a function, test_load_dashboard(ws), which uses the Dashboards class to save a dashboard from a source to a destination path, facilitating testing during the integration process.
  • Added dashboard-as-code functionality (#201). This commit introduces dashboard-as-code functionality for the UCX project, enabling the creation and management of dashboards using code. The feature resolves multiple issues and includes a new create-dashboard command for creating unpublished dashboards. The functionality is available in the lsql lab and allows for specifying the order and width of widgets, overriding default widget identifiers, and supporting various SQL and markdown header arguments. The dashboard.yml file is used to define top-level metadata for the dashboard. This commit also includes extensive documentation and examples for using the dashboard as a library and configuring different options.
  • Automate opening integration test dashboard in debug mode (#167). A new feature has been added to automatically open the integration test dashboard in debug mode, making it easier for software engineers to debug and troubleshoot. This has been achieved by importing the webbrowser and is_in_debug modules from "databricks.labs.blueprint.entrypoint", and adding a check in the create function to determine if the code is running in debug mode. If it is, a dashboard URL is constructed from the workspace configuration and dashboard ID, and then opened in a web browser using "webbrowser.open". This allows for a more streamlined debugging process for the integration test dashboard. No other parts of the code have been affected by this change.
  • Automatically tile widgets (#109). In this release, we've introduced an automatic widget tiling feature for the dashboard creation process in our open-source library. The Dashboards class now includes a new class variable, _maximum_dashboard_width, set to 6, representing the maximum width allowed for each row of widgets in the dashboard. The create_dashboard method has been updated to accept a new self parameter, turning it into an instance method. A new _get_position method has been introduced to calculate and return the next available position for placing a widget, and a _get_width_and_height method has been added to return the width and height for a widget specification, initially handling CounterSpec instances. Additionally, we've added new unit tests to improve testing coverage, ensuring that widgets are created, positioned, and sized correctly. These tests also cover the correct positioning of widgets based on their order and available space, as well as the expected width and height for each widget.
  • Bump actions/checkout from 4.1.3 to 4.1.6 (#102). In the latest release, the 'actions/checkout' GitHub Action has been updated from version 4.1.3 to 4.1.6, which includes checking the platform to set the archive extension appropriately. This release also bumps the version of github/codeql-action from 2 to 3, actions/setup-node from 1 to 4, and actions/upload-artifact from 2 to 4. Additionally, the minor-actions-dependencies group was updated with two new versions. Disabling extensions.worktreeConfig when disabling sparse-checkout was introduced in version 4.1.4. The release notes and changelog for this update can be found in the provided link. This commit was made by dependabot[bot] with contributions from cory-miller and jww3.
  • Bump actions/checkout from 4.1.6 to 4.1.7 (#151). In the latest release, the 'actions/checkout' GitHub action has been updated from version 4.1.6 to 4.1.7 in the project's push workflow, which checks out the repository at the start of the workflow. This change brings potential bug fixes, performance improvements, or new features compared to the previous version. The update only affects the version number in the YAML configuration for the 'actions/checkout' step in the release.yml file, with no new methods or alterations to existing functionality. This update aims to ensure a smooth and enhanced user experience for those utilizing the project's push workflows by taking advantage of the possible improvements or bug fixes in the new version of 'actions/checkout'.
  • Create a dashboard with a counter from a single query (#107). In this release, we have introduced several enhancements to our dashboard-as-code approach, including the creation of a Dashboards class that provides methods for getting, saving, and deploying dashboards. A new method, create_dashboard, has been added to create a dashboard with a single page containing a counter widget. The counter widget is associated with a query that counts the number of rows in a specified dataset. The deploy_dashboard method has also been added to deploy the dashboard to the workspace. Additionally, we have implemented a new feature for creating dashboards with a counter from a single query, including modifications to the test_dashboards.py file and the addition of four new tests. These changes improve the robustness of the dashboard creation process and provide a more automated way to view important metrics.
  • Create text widget from markdown file (#142). A new feature has been implemented in the library that allows for the creation of a text widget from a markdown file, enhancing customization and readability for users. This development resolves issue #1
  • Design document for dashboards-as-code (#105). "The latest release introduces 'Dashboards as Code,' a method for defining and managing dashboards through configuration files, enabling version control and controlled changes. The building blocks include .sql, .md, and dashboard.yml files, with .sql defining queries and determining tile order, and dashboard.yml specifying top-level metadata and tile overrides. Metadata can be inferred or explicitly defined in the query or files. The tile order can be determined by SQL file order, tiles order in dashboard.yml, or SQL file metadata. This project can also be used as a library for embedding dashboard generation in your code. Configuration precedence follows command-line flags, SQL file headers, dashboard.yml, and SQL query content. The command-line interface is utilized for dashboard generation from configuration files."
  • Ensure propagation of lsql version into User-Agent header when it is used as library (#206). In this release, the pyproject.toml file has be...
Read more

v0.4.3

08 May 09:19
@nfx nfx
9032c9d

Choose a tag to compare

  • Bump actions/checkout from 4.1.2 to 4.1.3 (#97). The actions/checkout dependency has been updated from version 4.1.2 to 4.1.3 in the update-main-version.yml file. This new version includes a check to verify the git version before attempting to disable sparse-checkout, and adds an SSH user parameter to improve functionality and compatibility. The release notes and CHANGELOG.md file provide detailed information on the specific changes and improvements. The pull request also includes a detailed commit history and links to corresponding issues and pull requests on GitHub for transparency. You can review and merge the pull request to update the actions/checkout dependency in your project.
  • Maintain PySpark compatibility for databricks.labs.lsql.core.Row (#99). In this release, we have added a new method asDict to the Row class in the databricks.labs.lsql.core module to maintain compatibility with PySpark. This method returns a dictionary representation of the Row object, with keys corresponding to column names and values corresponding to the values in each column. Additionally, we have modified the fetch function in the backends.py file to return Row objects of pyspark.sql when using self._spark.sql(sql).collect(). This change is temporary and marked with a TODO comment, indicating that it will be addressed in the future. We have also added error handling code in the fetch function to ensure the function operates as expected. The asDict method in this implementation simply calls the existing as_dict method, meaning the behavior of the asDict method is identical to the as_dict method. The as_dict method returns a dictionary representation of the Row object, with keys corresponding to column names and values corresponding to the values in each column. The optional recursive argument in the asDict method, when set to True, enables recursive conversion of nested Row objects to nested dictionaries. However, this behavior is not currently implemented, and the recursive argument is always False by default.

Dependency updates:

  • Bump actions/checkout from 4.1.2 to 4.1.3 (#97).

Contributors: @dependabot[bot], @bishwajit-db

v0.4.2

19 Apr 17:13
@nfx nfx
a582dba

Choose a tag to compare

  • Added more NotFound error type (#94). In the latest update, the core.py file in the databricks/labs/lsql package has undergone enhancements to the error handling functionality. The _raise_if_needed function has been modified to raise a NotFound error when the error message includes the phrase "does not exist". This update enables the system to categorize specific SQL query errors as NotFound error messages, thereby improving the overall error handling and reporting capabilities. This change was a collaborative effort, as indicated by the co-authored-by statement in the commit.

Contributors: @nkvuong

v0.4.1

12 Apr 12:30
@nfx nfx
5782b23

Choose a tag to compare

  • Fixing ovewrite integration tests (#92). A new enhancement has been implemented for the overwrite feature's integration tests, addressing a concern with write operations. Two new variables, catalog and "schema", have been incorporated using the env_or_skip function. These variables are utilized in the save_table method, which is now invoked twice with the same table, once with the append and once with the overwrite option. The data in the table is retrieved and checked for accuracy after each call, employing the updated Row class with revised field names first and "second", formerly name and "id". This modification ensures the proper operation of the overwrite feature during integration tests and resolves any related issues. The commit message Fixing overwrite integration tests signifies this change.

Contributors: @william-conti

v0.4.0

11 Apr 17:27
@nfx nfx
8f3d164

Choose a tag to compare

  • Added catalog and schema parameters to execute and fetch (#90). In this release, we have added optional catalog and schema parameters to the execute and fetch methods in the SqlBackend abstract base class, allowing for more flexibility when executing SQL statements in specific catalogs and schemas. These updates include new method signatures and their respective implementations in the SparkSqlBackend and DatabricksSqlBackend classes. The new parameters control the catalog and schema used by the SparkSession instance in the SparkSqlBackend class and the SqlClient instance in the DatabricksSqlBackend class. This enhancement enables better functionality in multi-catalog and multi-schema environments. Additionally, this change comes with unit tests and integration tests to ensure proper functionality. The new parameters can be used when calling the execute and fetch methods. For example, with a SparkSqlBackend instance spark_backend, you can execute a SQL statement in a specific catalog and schema with the following code: spark_backend.execute("SELECT * FROM my_table", catalog="my_catalog", schema="my_schema"). Similarly, the fetch method can also be used with the new parameters.

Contributors: @FastLee

v0.3.1

02 Apr 16:16
@nfx nfx
155bea0

Choose a tag to compare

  • Check UCX and LSQL for backwards compatibility (#78). In this release, we introduce a new GitHub Actions workflow, downstreams.yml, which automates unit testing for downstream projects upon changes made to the upstream project. The workflow runs on pull requests, merge groups, and pushes to the main branch and sets permissions for id-token, contents, and pull-requests. It includes a compatibility job that runs on Ubuntu, checks out the code, sets up Python, installs the toolchain, and accepts downstream projects using the databrickslabs/sandbox/downstreams action. The job matrix includes two downstream projects, ucx and remorph, and uses the build cache to speed up the pip install step. This feature ensures that changes to the upstream project do not break compatibility with downstream projects, maintaining a stable and reliable library for software engineers.
  • Fixed Builder object has no attribute sdk_config error (#86). In this release, we've resolved a Builder object has no attribute sdk_config error that occurred when initializing a Spark session using the DatabricksSession.builder method. The issue was caused by using dot notation to access the sdk_config attribute, which is incorrect. This has been updated to the correct syntax of sdkConfig. This change enables successful creation of the Spark session, preventing the error from recurring. The DatabricksSession class and its methods, such as getOrCreate, continue to be used for interacting with Databricks clusters and workspaces, while the WorkspaceClient class manages Databricks resources within a workspace.

Dependency updates:

  • Bump codecov/codecov-action from 1 to 4 (#84).
  • Bump actions/setup-python from 4 to 5 (#83).
  • Bump actions/checkout from 2.5.0 to 4.1.2 (#81).
  • Bump softprops/action-gh-release from 1 to 2 (#80).

Contributors: @dependabot[bot], @nfx, @bishwajit-db, @william-conti

v0.3.0

27 Mar 13:28
@nfx nfx
073c922

Choose a tag to compare

  • Added support for save_table(..., mode="overwrite") to StatementExecutionBackend (#74). In this release, we've added support for overwriting a table when saving data using the save_table method in the StatementExecutionBackend. Previously, attempting to use the overwrite mode would raise a NotImplementedError. Now, when this mode is specified, the method first truncates the table before inserting the new rows. The truncation is done using the execute method to run a TRUNCATE TABLE SQL command. Additionally, we've added a new integration test, test_overwrite, to the test_deployment.py file to verify the new overwrite mode functionality. A new option, mode="overwrite", has been added to the save_table method, allowing for the existing data in the table to be deleted and replaced with the new data being written. We've also added two new test cases, test_statement_execution_backend_save_table_overwrite_empty_table and test_mock_backend_overwrite, to verify the new functionality. It's important to note that the method signature has been updated to include a default value for the mode parameter, setting it to append by default. This change does not affect the functionality and only provides a more convenient default behavior for users of the method.

Contributors: @william-conti

v0.2.5

26 Mar 08:59
@nfx nfx
8921e0f

Choose a tag to compare

  • Fixed PyPI badge (#72). In this release, we have implemented a fix to the PyPI badge in the README file of our open-source library. The PyPI badge displays the version of the package and serves as a quick reference for users. This fix ensures the accuracy and proper functioning of the badge, without involving any changes to the functionality or methods within the project. Software engineers can be assured that this update is limited to the README file, specifically the PyPI badge, and will not affect the overall functionality of the library.
  • Fixed no-cheat check (#71). In this release, we have made improvements to the no-cheat verification process for new code. Previously, the check for disabling the linter was prone to false positives when the string '# pylint: disable' appeared for reasons other than disabling the linter. The updated code now includes an additional filter to exclude the string CHEAT from the search, and the number of characters in the output is counted using the wc -c command. If the count is not zero, the script will terminate with an error message. This change enhances the accuracy of the no-cheat check, ensuring that the linter is being used correctly and that all new code meets our quality standards.
  • Removed upper bound on sqlglot dependency (#70). In this update, we have removed the upper bound on the sqlglot dependency version in the project's pyproject.toml file. Previously, the version constraint required sqlglot to be at least 22.3.1 but less than 22.5.0. With this modification, there will be no upper limit, enabling the project to utilize any version greater than or equal to 22.3.1. This change provides the project with the flexibility to take advantage of future bug fixes, performance improvements, and new features available in newer sqlglot package versions. Developers should thoroughly test the updated package version to ensure compatibility with the existing codebase.

Contributors: @nfx