Releases · Center-for-Health-Data-Science/bulkDGD · GitHub

01 Aug 08:35

ValeSora

Version 1.0.5 Latest

Latest

[v1.0.5] - 2024-08-01

Bug fixes

Fixed the os import in the bulkDGD.core.model module.

Assets 2

29 Jul 15:02

ValeSora

Version 1.0.4

[v1.0.4] - 2024-07-29

Internal changes

Added the uniquify_file_path() function to the bulkDGD.core.model module because if imported from bulkDGD.util the documentation would not build correctly.

Assets 2

23 Jul 09:23

ValeSora

Version 1.0.3

[v1.0.3] - 2024-07-23

Several updates to the Recount3 sub-package.

Other changes:

Fixed a bug with the bulkDGD.util.get_handlers function.
Removed the batch number from the name of the output files in dgd_get_recount3_data.

Internal changes (for contributors):

The recount3.util._check_category function was removed (together with the associated recount3.defaults.RECOUNT3_SUPPORTED_CATEGORIES_FILE value the text files in recount3/data used for checking) because it added an unnecessary layer of complexity (the download will fail anyway and will report the error if the GTEx/TCGA/SRA code is incorrect).

Assets 2

22 Jul 09:46

ValeSora

Version 1.0.2 Pre-release

Pre-release

[v1.0.2] - 2024-07-22

Bug fixes

Committed and pushed the bulkDGD/util.py file, which was not included in the previous release.

Assets 2

18 Jul 15:12

ValeSora

Version 1.0.1 Pre-release

Pre-release

[v1.0.1] - 2024-07-18

API-breaking changes:

The output data frames produced by the dgd_get_recount3_data executable now contain both gene expression data and metadata unless otherwise filtered (see below).

Other changes:

Now the experiment_attributes column, if present in the metadata columns of an SRA study, will be split into its constituent components when writing the output data frames for the dgd_get_recount3_data executable (as it is already the case with the sample_attributes column).
The user can now pass a YAML file to dgd_get_recount3_data to download data from the Recount3 platform in bulk and filter them.
The user can now pass metadata_to_keep and metadata_to_drop lists of metadata columns in the input file to dgd_get_recount3_data to keep or drop specific metadata columns in the output data frames. These can be passed both as columns if the input file is a CSV file or as specific keywords if the input file is a YAML file.
The recount3.util.get_metadata function now returns the metadata data frame with the recount3_project_name and recount3_samples_category columns added.
The model_untrained.yaml configuration file was added to the examples of configuration files available within the package.

Internal changes (for contributors):

Two new internal functions in the bulkDGD.recount3.util module (_load_samples_batches_csv'and load_samples_batches_yaml) were introduced to parse the input files to dgd_get_recount3_data. The public function load_samples_batches simply calls one of them depending on the file's extension.
The bulkDGD.util.get_handlers function now accepts two new arguments, log_level_console and log_level_file instead of the old log_level to have more fine-grained control over the log level of the handlers.
The log level of the console handler for the _dgd_get_recount3_data_single_batch executable was changed to ERROR so as not to clutter the console too much with all the INFO messages from the subprocesses (which get logged to their own log files anyway if the overall log level is INFO or below).
The header of the bulkDGD/recount3/data/sra_metadata_fields.txt file was changed to better describe the metadata fields included in it.

Documentation:

The documentation was updated to reflect the user-facing changes.
The readme files for the configurations were removed because of the redundancy in the content of the documentation and the configuration files themselves.

Assets 2

07 Jul 13:38

ValeSora

Version 1.0.0

[v1.0.0] - 2024-07-07

Added

The train() method was added to the 'core.model.DGDModel' class to train the DGD model.
The dgd_train executable was added to train the DGD model using the command line.
A new type of configuration file containing the options to train the DGD model is available. An example can be found in the newly created bulkDGD/ioutil/configs/training directory inside the package. This file, along with the other configuration files, is installed with the package.
A new example of a configuration file (model_untrained.yaml) containing the options to set up the DGD model is available in the 'bulkDGD/ioutil/configs/model' directory for when the model needs to be set up before training.
The documentation now includes a new tutorial on how to train the DGD model (Tutorial 3).
The load_loss() and save_loss() functions were introduced in the new bulkDGD.ioutil.lossio module to load and save CSV files containing the losses reported during the training procedure.

API-breaking changes

The configuration file used to find the representations for new samples now has a new format. Please take a look at the documentation for a detailed explanation of the new format. The format is not backward compatible.

Internal changes (for contributors)

The _get_data_loader() method has been introduced into the core.model.DGDModel class to create data loaders.
There is now only one internal method in the core.model.DGDModel class responsible for optimizing one or multiple representations for a set of samples, _optimize_rep(). The _get_representations_one_opt() and the _get_representations_two_opt() methods have been updated accordingly.
New sanity checks have been introduced when loading configurations (ioutil.configio module).
A new _get_final_dataframes_train method has been introduced to create the data frames produced by the new train() method in the core.model.DGDModel class.

Notes

The documentation was updated to reflect all changes made to the codebase.
Anders Lykkebo-Valløe is now a contributor.
Andreas Bjerregaard is now a contributor.

Assets 2