This repository contains a pipeline for aggregating ERA5 environmental exposures data to a 0.1 degree grid. The pipeline is designed to be run on FASRC. We developed
this pipeline using nbdev, which means that we can create modules and scripts from notebooks.
Hence, all of the documentation for how the pipeline was developed and validated is
available in notes/index.ipynb and the associated notebooks.
To review a PR on this repository, follow these steps:
-
Obtain an API key for the ERA5 datastore from here, and ask Tinashe for access to the Golden Lab
googledriverAPI key -
Clone this repository to your workspace on FASRC
-
Create a conda environment with
conda create -n era5_sandbox python=3.10and install all of the necessary dependencies for the package withpip install -e . -
Run the
coremodule to test your API key and setup the data directory structure
python src/era5_sandbox/core.py
-
Symlink your local data directory to the original work
ln -s [YOUR WORKING DIRECTORY]/data /n/dominici_lab/lab/data_processing/csph-era5_sandbox/data -
Dry run by removing a file from data
snakemake --dry-run -
Run the pipeline
sbatch snakemake.sbatch