|
1 | | - |
2 | 1 | # **DeepCoil** # |
3 | | -Accurate prediction of coiled coil domains in protein sequences. |
4 | | - |
| 2 | +[](https://doi.org/10.1093/bioinformatics/bty1062 ) |
| 3 | + |
| 4 | + |
| 5 | +**Fast and accurate prediction of coiled coil domains in protein sequences.** |
5 | 6 | ## **Installation** ## |
6 | | -First clone this repository: |
7 | | -```bash |
8 | | -$ git clone https://github.com/labstructbioinf/DeepCoil.git |
9 | | -``` |
10 | | -Required packages to run DeepCoil are listed in the **`requirements.txt`** file. |
11 | | -We suggest running DeepCoil in the virtual environment: |
12 | | -If you don't have virtualenv installed do so: |
13 | | -```bash |
14 | | -$ pip3 install virtualenv |
15 | | -``` |
16 | | -Create virtual environment and install required packages: |
| 7 | +The most convenient way to install **DeepCoil** is to use pip: |
17 | 8 | ```bash |
18 | | -$ cd virtual_envs_location |
19 | | -$ virtualenv deepcoil_env |
20 | | -$ source deepcoil_env/bin/activate |
21 | | -$ cd DEEPCOIL_LOCATION |
22 | | -$ pip3 install -r requirements.txt |
| 9 | +$ pip install deepcoil |
23 | 10 | ``` |
24 | | -Test the installation: |
25 | | -```bash |
26 | | -$ ./run_example.sh |
27 | | -``` |
28 | | -This should produce output **`example/out_pssm/GCN4_YEAST.out`** identical to **`example/out_pssm/GCN4_YEAST.out.bk`** and accordingly for the **`example/out_seq/`** directory. |
29 | | - |
| 11 | + |
30 | 12 | ## **Usage** ## |
| 13 | + |
| 14 | +##### Running DeepCoil as standalone: |
| 15 | + |
31 | 16 | ```bash |
32 | | -python3.5 deepcoil.py [-h] -i FILE [-out_path DIR] [-pssm] [-pssm_path DIR] |
| 17 | +deepcoil [-h] -i FILE [-out_path DIR] |
33 | 18 | ``` |
34 | 19 | | Option | Description | |
35 | 20 | |:-------------:|-------------| |
36 | 21 | | **`-i`** | Input file in FASTA format. Can contain multiple entries. | |
37 | | -| **`-pssm`** | Flag for the PSSM-mode. If enabled DeepCoil will require psiblast PSSM files in the pssm_path. Otherwise only sequence information will be used.| |
38 | | -| **`-pssm_path`** | Directory with psiblast PSSM files. For each entry in the input fasta file there must be a PSSM file. | |
39 | | -| **`-out_path`** | Directory where the predictions are saved. For each entry one file will be saved. | |
40 | | -| **`-out_type`** | Output type. Either **'ascii'** (default), which will write single file for each entry in input or **'h5'** which will generate single hdf5 file storing all predictions. | |
41 | | -| **`-out_filename`** | Works with **"-out_type h5"** option and specifies the hdf5 output filename Overrides the **-out_path** if specified. | |
42 | | -| **`-min_residue_score`** | Number in the range <0,1>. DeepCoil will return sequences that have at least one residue with score greater than min_residue_score | |
43 | | -| **`-min_segment_length`** | Number greater than 0. DeepCoil will return sequences that contain a segment of length **-min_segment_length** or more. To be used with **-min_residue_score** | |
| 22 | +| **`-out_path`** | Directory where the predictions are saved. For each entry in the input file one file will be saved.| |
| 23 | +| **`--gpu`** | Flag for turning on the GPU usage. Results in faster inference on large datasets.| |
44 | 24 |
|
45 | | -Results of **`-min_residue_score`** and **`-min_segment_length`** filters are stored in directories located in **`-out_path`**. |
46 | 25 |
|
47 | | -PSSM filenames should be based on the identifiers in the fasta file (only alphanumeric characters and '_'). For example if a fasta sequence is as follows: |
48 | | -``` |
49 | | ->GCN4_YEAST RecName: Full=General control protein GCN4; AltName: Full=Amino acid biosynthesis regulatory protein |
50 | | -MSEYQPSLFALNPMGFSPLD.... |
51 | | -``` |
52 | | -PSSM file should be named **`GCN4_YEAST.pssm`**. |
53 | | - |
54 | | -You can generate PSSM files with the following command (requires NR90 database): |
55 | | -```bash |
56 | | -psiblast -query GCN4_YEAST.fasta -db NR90_LOCATION -evalue 0.001 -num_iterations 3 -out_ascii_pssm GCN4_YEAST.pssm |
57 | | -``` |
58 | | -In order to generate PSSM file from multiple sequence alignment (MSA) you can use this command: |
59 | | -```bash |
60 | | -psiblast -subject sequence.fasta -in_msa alignment.fasta -out_ascii_pssm output.pssm |
| 26 | +##### Running DeepCoil within script: |
| 27 | + |
| 28 | +```python |
| 29 | +from deepcoil import DeepCoil |
| 30 | +from deepcoil.utils import plot_preds |
| 31 | +from Bio import SeqIO |
| 32 | + |
| 33 | +dc = DeepCoil(use_gpu=True) |
| 34 | + |
| 35 | +inp = {str(entry.id): str(entry.seq) for entry in SeqIO.parse('example/example.fas', 'fasta')} |
| 36 | + |
| 37 | +results = dc.predict(inp) |
| 38 | + |
| 39 | +plot_preds(results['3WPA_1'], to_file='example/example.png') |
61 | 40 | ``` |
| 41 | +###### Example graphical output: |
| 42 | + |
0 commit comments