Skip to content

Commit b674e3c

Browse files
authored
Merge pull request #30 from nimh-dsst/integrate-into-bids-tree
Integrate into bids tree
2 parents 534919b + 06dee5b commit b674e3c

File tree

6 files changed

+263
-133
lines changed

6 files changed

+263
-133
lines changed

.gitignore

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,9 @@
11
__pycache__/
2-
as_qc_failures.csv
3-
*.swarm
42
*_testing/
53
*_examples/
6-
scripts_outputs/
74
.idea/
5+
.vscode/
86
examples/visualqc_prep/
97
examples/sub-01/
10-
user-testing/*
11-
.vscode/*
12-
src/isolate_fsleyes_render_issue.py
8+
lookup.json
9+
rename.py

README.md

Lines changed: 95 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -2,34 +2,38 @@
22

33
# DSST Defacing Pipeline
44

5-
The DSST Defacing Pipeline has been developed to make the process of defacing anatomical scans of large datasets,
6-
visually inspecting for accuracy and fixing scans that fail visual inspection more efficient and straightforward. The
7-
pipeline _requires_ the input dataset to be in BIDS format. A conceptual description of the pipeline can
8-
found [here](#conceptual-design).
5+
The DSST Defacing Pipeline has been developed to make the process of defacing anatomical scans as well as
6+
visually quality controlling (QC) and fixing scans that fail QC more efficient and straightforward. The
7+
pipeline requires the input dataset to be in BIDS format. A conceptual description of the pipeline can be
8+
found [below](#conceptual-design).
99

1010
This pipeline is designed and tested to work on the NIH HPC systems. While it's possible to get the pipeline running on
1111
other platforms, please note that it can be error-prone and is not recommended.
1212

13-
## Usage Instructions
13+
## Setup Instructions
1414

15-
### Clone this repository
15+
### 1. Clone this repository
1616

1717
```bash
18-
git clone git@github.com:nih-fmrif/dsst-defacing-pipeline.git
18+
git clone https://github.com/nimh-dsst/dsst-defacing-pipeline.git
1919
```
2020

21-
### Install required packages
21+
### 2. Install required packages
2222

2323
Apart from AFNI and FSL packages, available as HPC modules, users will need the following packages in their working
2424
environment
2525

2626
- VisualQC
2727
- FSLeyes
28-
- Python 3.x
28+
- Python 3.7+
2929

3030
There are many ways to create a virtual environment with the required packages, however, we currently only provide
3131
instructions to create a conda environment. If you don't already have conda installed, please find
32-
instructions [here](https://docs.conda.io/en/latest/miniconda.html). Run the following command to create a conda
32+
[Miniconda install instructions here](https://docs.conda.io/en/latest/miniconda.html).
33+
34+
### 3. Create a conda environment
35+
36+
Run the following command to create a conda
3337
environment called `dsstdeface` using the `environment.yml` file from this repo.
3438

3539
```bash
@@ -38,38 +42,63 @@ conda env create -f environment.yml
3842

3943
Once conda finishes creating the virtual environment, activate `dsstdeface`.
4044

41-
```bash
42-
conda activate dsstdeface
43-
```
45+
```bash
46+
conda activate dsstdeface
47+
```
4448

45-
### Run `dsst_defacing_wf.py`
49+
## Using `dsst_defacing_wf.py`
4650

47-
To deface anatomical scans in the dataset, run `dsst_defacing_wf.py` script.
51+
To deface anatomical scans in the dataset, run the `src/dsst_defacing_wf.py` script. From within the `dsst-defacing-pipeline` cloned directory, run the following command to see the help message.
4852

49-
```
50-
% python src/dsst_defacing_wf.py -h
51-
usage: dsst_defacing_wf.py [-h] --input INPUT --output OUTPUT [--participant-id SUBJ_ID] [--session-id SESS_ID] [--no-clean]
53+
```text
54+
% python src/dsst_defacing_wf.py -h
5255
53-
Deface anatomical scans for a given BIDS dataset or a subject directory in BIDS format.
56+
usage: dsst_defacing_wf.py [-h] [-n N_CPUS]
57+
[-p PARTICIPANT_LABEL [PARTICIPANT_LABEL ...]]
58+
[-s SESSION_ID [SESSION_ID ...]]
59+
[--no-clean]
60+
bids_dir output_dir
5461
55-
optional arguments:
56-
-h, --help show this help message and exit
57-
--input INPUT, -i INPUT
58-
Path to input BIDS dataset.
59-
--output OUTPUT, -o OUTPUT
60-
Path to output BIDS dataset with defaced scan.
61-
--participant-id SUBJ_ID, -p SUBJ_ID
62-
Subject ID associated with the participant. Since the input dataset is assumed to be BIDS valid, this argument expects subject IDs with 'sub-' prefix.
63-
--session-id SESS_ID, -s SESS_ID
64-
Session ID associated with the subject ID. If the BIDS input dataset contains sessions, then this argument expects session IDs with 'ses-' prefix.
65-
--no-clean If this argument is provided, then AFNI intermediate files are preserved.
62+
Deface anatomical scans for a given BIDS dataset or a subject
63+
directory in BIDS format.
64+
65+
positional arguments:
66+
bids_dir The directory with the input dataset
67+
formatted according to the BIDS standard.
68+
output_dir The directory where the output files should
69+
be stored.
6670
71+
options:
72+
-h, --help show this help message and exit
73+
-n N_CPUS, --n-cpus N_CPUS
74+
Number of parallel processes to run when
75+
there is more than one folder. Defaults to
76+
1, meaning "serial processing".
77+
-p PARTICIPANT_LABEL [PARTICIPANT_LABEL ...], --participant-label PARTICIPANT_LABEL [PARTICIPANT_LABEL ...]
78+
The label(s) of the participant(s) that
79+
should be defaced. The label corresponds to
80+
sub-<participant_label> from the BIDS spec
81+
(so it does not include "sub-"). If this
82+
parameter is not provided all subjects
83+
should be analyzed. Multiple participants
84+
can be specified with a space separated
85+
list.
86+
-s SESSION_ID [SESSION_ID ...], --session-id SESSION_ID [SESSION_ID ...]
87+
The ID(s) of the session(s) that should be
88+
defaced. The label corresponds to
89+
ses-<session_id> from the BIDS spec (so it
90+
does not include "ses-"). If this parameter
91+
is not provided all subjects should be
92+
analyzed. Multiple sessions can be specified
93+
with a space separated list.
94+
--no-clean If this argument is provided, then AFNI
95+
intermediate files are preserved.
6796
```
6897

6998
The script can be run serially on a BIDS dataset or in parallel at subject/session level. The three methods of running
7099
the script have been described below with example commands:
71100

72-
For readability of example commands, the following bash variables have defined as follows:
101+
For readability of example commands, the following bash variables have been defined as follows:
73102

74103
```bash
75104
INPUT_DIR="<path/to/BIDS/input/dataset>"
@@ -79,24 +108,27 @@ OUTPUT_DIR="<path/to/desired/defacing/output/directory>"
79108
**NOTE:** In the example commands below, `<path/to/BIDS/input/dataset>` and `<path/to/desired/output/directory>` are
80109
placeholders for paths to input and output directories, respectively.
81110

82-
#### Option 1: Serially
111+
### Option 1: Serial defacing
83112

84113
If you have a small dataset with less than 10 subjects, then it might be easiest to run the defacing algorithm serially.
85114

86115
```bash
87-
python dsst-defacing-pipeline/src/dsst_defacing_wf.py -i ${INPUT_DIR} -o ${OUTPUT_DIR}
116+
python src/dsst_defacing_wf.py ${INPUT_DIR} ${OUTPUT_DIR}
88117
```
89118

90-
#### Option 2: In parallel at subject level
119+
### Option 2: Parallel defacing
91120

92-
If you have dataset with over 10 subjects, then it might be more practical to run the pipeline in parallel for every
93-
subject in the dataset using the `-p/--participant-id` option as follows:
121+
If you have dataset with over 10 subjects and since each defacing job is independent, it might be more practical to run the pipeline in parallel for every
122+
subject/session in the dataset using the `-n/--n-cpus` option. The following example command will run the pipeline occupying 10 processors at a time.
94123

95124
```bash
96-
python dsst_defacing_wf.py -i ${INPUT_DIR} -o ${OUTPUT_DIR} -p sub-<index>
125+
python src/dsst_defacing_wf.py ${INPUT_DIR} ${OUTPUT_DIR} -n 10
97126
```
98127

99-
a. Assuming these scripts are run on the NIH HPC system, the first step would be to create a `swarm` file:
128+
### Option 3: Parallel defacing using `swarm`
129+
130+
131+
Assuming these scripts are run on the NIH HPC system, you can create a `swarm` file:
100132

101133
```bash
102134

@@ -106,19 +138,19 @@ a. Assuming these scripts are run on the NIH HPC system, the first step would be
106138
done > defacing_parallel_subject_level.swarm
107139
```
108140

109-
Purpose: Loop through the dataset and find all subject directories to construct `dsst_defacing_wf.py` command
110-
with `-p/--participant-id` option.
141+
The above BASH "for loop" crawls through the dataset and finds all subject directories to construct `dsst_defacing_wf.py` commands
142+
with the `-p/--participant-label` option.
111143

112-
b. Run the swarm file with following command to start a swarm job
144+
Next you can run the swarm file with the following command:
113145

114-
```bash
115-
swarm -f defacing_parallel_subject_level.swarm --merge-output --logdir ${OUTPUT_DIR}/swarm_log
116-
```
146+
```bash
147+
swarm -f defacing_parallel_subject_level.swarm --merge-output --logdir ${OUTPUT_DIR}/swarm_log
148+
```
117149

118-
#### Option 3: In parallel at session level
150+
### Option 4: In parallel at session level
119151

120152
If the input dataset has multiple sessions per subject, then run the pipeline on every session in the dataset
121-
parallelly. Similar to Option 2, the following commands loop through the dataset to find subject and session IDs to
153+
in parallel. Similar to Option 2, the following commands loop through the dataset to find subject and session IDs to
122154
create a `swarm` file to be run on NIH HPC systems.
123155

124156
```bash
@@ -131,54 +163,56 @@ for i in `ls -d ${INPUT_DIR}/sub-*`; do
131163
done > defacing_parallel_session_level.swarm
132164
```
133165

166+
To run the swarm file, once created, use the following command:
167+
134168
```bash
135169
swarm -f defacing_parallel_session_level.swarm --merge-output --logdir ${OUTPUT_DIR}/swarm_log
136170
```
137171

138-
### Run `generate_renders.py`
172+
## Using `generate_renders.py`
139173

140174
Generate 3D renders for every defaced image in the output directory.
141175

142176
```bash
143177
python dsst-defacing-pipeline/src/generate_renders.py -o ${OUTPUT_DIR}
144178
```
145179

146-
### Visual Inspection
180+
## Visual Inspection
147181

148182
To visually inspect quality of defacing with [VisualQC](https://raamana.github.io/visualqc/readme.html), we'll need to:
149183

150-
1. Open TurboVNC through an spersist session. More info [here](https://hpc.nih.gov/docs/nimh.html).
184+
1. Open TurboVNC through an spersist session. More info on [the NIH HPC docs](https://hpc.nih.gov/docs/nimh.html).
151185
2. Run the `vqcdeface` command from a command-line terminal within a TurboVNC instance
152186

153-
```bash
154-
sh ${OUTPUT_DIR}/QC_prep/defacing_qc_cmd
155-
```
187+
```bash
188+
sh ${OUTPUT_DIR}/QC_prep/defacing_qc_cmd
189+
```
156190

157191
## Conceptual design
158192

159-
1. Generate a ["primary" scans](#terminology) to [other scans'](#terminology) mapping file.
193+
1. Generate a ["primary" scans](#terminology) to [other scans](#terminology) mapping file.
160194
2. Deface primary scans
161195
with [@afni_refacer_run](https://afni.nimh.nih.gov/pub/dist/doc/htmldoc/tutorials/refacer/refacer_run.html) program
162196
developed by the AFNI Team.
163197
3. To deface remaining scans in the session, register them to the primary scan (using FSL `flirt` command) and then use
164198
the primary scan's defacemask to generate a defaced image (using `fslmaths` command).
165199
4. Visually inspect defaced scans with [VisualQC](https://raamana.github.io/visualqc) deface tool or any other preferred
166200
tool.
167-
5. Correct/fix defaced scans that failed visual inspection. See [here]() for more info on types of failures.
201+
5. Correct/fix defaced scans that failed visual inspection. See [here](FILLINTHEBLANK) for more info on types of failures.
168202
169203
![Defacing Pipeline flowchart](images/defacing_pipeline.png)
170204
171205
## Terminology
172206
173-
While describing the process, we frequently use the following terms:
207+
While describing this process, we frequently use the following terms:
174208
175209
- **Primary Scan:** The best quality T1w scan within a session. For programmatic selection, we assume that the most
176210
recently acquired T1w scan is of the best quality.
177-
- **Other/Secondary Scans:** All scans *except* the primary scan are grouped together and referred to as "other" or "
178-
secondary" scans for a given session.
179-
- **Mapping File:** A JSON file that assigns maps a primary scan (or `primary_t1`) to all other scans within a session.
180-
Please find an example file [here]().
181-
- **[VisualQC](https://raamana.github.io/visualqc):** A suite of QC tools developed by Pradeep Raamana (Assistant
211+
- **Other/Secondary Scans:** All scans *except* the primary scan are grouped together and referred to as "other" or
212+
"secondary" scans for a given session.
213+
- **Mapping File:** A JSON file that assigns/maps a primary scan (or `primary_t1`) to all other scans within a session.
214+
Please find an example file [here](https://github.com/nimh-dsst/dsst-defacing-pipeline/blob/47288e429d0614a1d0be44f7176f85570823fbaa/examples/primary_to_others_mapping.json).
215+
- **[VisualQC](https://raamana.github.io/visualqc):** A suite of QC tools developed by Pradeep Raamana, PhD (Assistant
182216
Professor at University of Pittsburgh).
183217
184218
## References
@@ -196,7 +230,7 @@ While describing the process, we frequently use the following terms:
196230
197231
## Acknowledgements
198232
199-
We'd like to thank [Pradeep Raamana](https://www.aimi.pitt.edu/people/ant), Assistant Professor at the Department of
233+
We'd like to thank [Pradeep Raamana, PhD.](https://www.aimi.pitt.edu/people/ant), Assistant Professor at the Department of
200234
Radiology at University of Pittsburgh, and [Paul Taylor](https://afni.nimh.nih.gov/Staff), Acting Director of Scientific
201235
and Statistical Computing Core (SSCC) at NIMH for their timely help in resolving and adapting VisualQC and AFNI Refacer,
202236
respectively, for the specific needs of this project.

images/pipeline_screen_quality.png

-94.4 KB
Binary file not shown.

src/deface.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import gzip
22
import re
33
import shutil
4+
import os
45
import subprocess
56
from os import fspath
67
from pathlib import Path
@@ -119,7 +120,7 @@ def reorganize_into_bids(input_bids_dir, subj_dir, sess_dir, primary_t1, bids_de
119120
intermediate_files_dir.mkdir(parents=True, exist_ok=True)
120121
for dirpath in anat_dir.glob('*'):
121122
if dirpath.name.startswith('workdir') or dirpath.name.endswith('QC'):
122-
shutil.move(dirpath, intermediate_files_dir)
123+
shutil.move(str(dirpath), str(intermediate_files_dir))
123124

124125
vqcdeface_prep(input_bids_dir, anat_dir, bids_defaced_outdir)
125126

@@ -153,7 +154,7 @@ def run_afni_refacer(primary_t1, others, subj_input_dir, sess_dir, output_dir):
153154
refacer_cmd = f"@afni_refacer_run -input {primary_t1} -mode_deface -no_clean -prefix {fspath(subj_outdir / prefix)}"
154155

155156
# TODO remove module load afni
156-
full_cmd = f"module load afni ; {refacer_cmd}"
157+
full_cmd = f"module load afni ; export OMP_NUM_THREADS=1 ; {refacer_cmd}"
157158

158159
# TODO make log text less ugly; perhaps in a separate function
159160
log_filename = subj_outdir / 'defacing_pipeline.log'
@@ -190,21 +191,22 @@ def run_afni_refacer(primary_t1, others, subj_input_dir, sess_dir, output_dir):
190191
def deface_primary_scan(input_bids_dir, subj_input_dir, sess_dir, mapping_dict, output_dir, no_clean):
191192
missing_refacer_outputs = [] # list to capture missing afni refacer workdirs
192193

193-
subj_id = subj_input_dir.name
194-
sess_id = sess_dir.name if sess_dir else None
194+
subj_id = os.path.basename(subj_input_dir)
195+
sess_id = os.path.basename(sess_dir) if sess_dir else None
195196

196197
if sess_dir:
197198
primary_t1 = mapping_dict[subj_id][sess_id]['primary_t1']
198199
others = [str(s) for s in mapping_dict[subj_id][sess_id]['others'] if s != primary_t1]
199200
missing_refacer_outputs.append(run_afni_refacer(primary_t1, others, subj_input_dir, sess_dir, output_dir))
201+
print(f"Reorganizing {sess_dir} with defaced images into BIDS tree...\n")
200202

201203
else:
202204
primary_t1 = mapping_dict[subj_id]['primary_t1']
203205
others = [str(s) for s in mapping_dict[subj_id]['others'] if s != primary_t1]
204206
missing_refacer_outputs.append(run_afni_refacer(primary_t1, others, subj_input_dir, "", output_dir))
207+
print(f"Reorganizing {subj_input_dir} with defaced images into BIDS tree...\n")
205208

206209
# reorganizing the directory with defaced images into BIDS tree
207-
print(f"Reorganizing {sess_dir} with defaced images into BIDS tree...\n")
208210
reorganize_into_bids(input_bids_dir, subj_input_dir, sess_dir, primary_t1, output_dir, no_clean)
209211

210212
return missing_refacer_outputs

0 commit comments

Comments
 (0)