Skip to content

Commit 3d3290d

Browse files
author
Jan Ludwiczak
committed
Update README.md, add example files
1 parent 3536700 commit 3d3290d

File tree

3 files changed

+32
-48
lines changed

3 files changed

+32
-48
lines changed

README.md

Lines changed: 29 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,42 @@
1-
![Build Status](https://travis-ci.org/labstructbioinf/DeepCoil.svg?branch=master)
21
# **DeepCoil** #
3-
Accurate prediction of coiled coil domains in protein sequences.
4-
2+
[![DOI:10.1093/bioinformatics/bty1062 ](https://zenodo.org/badge/DOI/10.1093/bioinformatics/bty1062.svg)](https://doi.org/10.1093/bioinformatics/bty1062 )
3+
![build](https://github.com/labstructbioinf/DeepCoil/workflows/deepcoil/badge.svg)
4+
5+
**Fast and accurate prediction of coiled coil domains in protein sequences.**
56
## **Installation** ##
6-
First clone this repository:
7-
```bash
8-
$ git clone https://github.com/labstructbioinf/DeepCoil.git
9-
```
10-
Required packages to run DeepCoil are listed in the **`requirements.txt`** file.
11-
We suggest running DeepCoil in the virtual environment:
12-
If you don't have virtualenv installed do so:
13-
```bash
14-
$ pip3 install virtualenv
15-
```
16-
Create virtual environment and install required packages:
7+
The most convenient way to install **DeepCoil** is to use pip:
178
```bash
18-
$ cd virtual_envs_location
19-
$ virtualenv deepcoil_env
20-
$ source deepcoil_env/bin/activate
21-
$ cd DEEPCOIL_LOCATION
22-
$ pip3 install -r requirements.txt
9+
$ pip install deepcoil
2310
```
24-
Test the installation:
25-
```bash
26-
$ ./run_example.sh
27-
```
28-
This should produce output **`example/out_pssm/GCN4_YEAST.out`** identical to **`example/out_pssm/GCN4_YEAST.out.bk`** and accordingly for the **`example/out_seq/`** directory.
29-
11+
3012
## **Usage** ##
13+
14+
##### Running DeepCoil as standalone:
15+
3116
```bash
32-
python3.5 deepcoil.py [-h] -i FILE [-out_path DIR] [-pssm] [-pssm_path DIR]
17+
deepcoil [-h] -i FILE [-out_path DIR]
3318
```
3419
| Option | Description |
3520
|:-------------:|-------------|
3621
| **`-i`** | Input file in FASTA format. Can contain multiple entries. |
37-
| **`-pssm`** | Flag for the PSSM-mode. If enabled DeepCoil will require psiblast PSSM files in the pssm_path. Otherwise only sequence information will be used.|
38-
| **`-pssm_path`** | Directory with psiblast PSSM files. For each entry in the input fasta file there must be a PSSM file. |
39-
| **`-out_path`** | Directory where the predictions are saved. For each entry one file will be saved. |
40-
| **`-out_type`** | Output type. Either **'ascii'** (default), which will write single file for each entry in input or **'h5'** which will generate single hdf5 file storing all predictions. |
41-
| **`-out_filename`** | Works with **"-out_type h5"** option and specifies the hdf5 output filename Overrides the **-out_path** if specified. |
42-
| **`-min_residue_score`** | Number in the range <0,1>. DeepCoil will return sequences that have at least one residue with score greater than min_residue_score |
43-
| **`-min_segment_length`** | Number greater than 0. DeepCoil will return sequences that contain a segment of length **-min_segment_length** or more. To be used with **-min_residue_score** |
22+
| **`-out_path`** | Directory where the predictions are saved. For each entry in the input file one file will be saved.|
23+
| **`--gpu`** | Flag for turning on the GPU usage. Results in faster inference on large datasets.|
4424

45-
Results of **`-min_residue_score`** and **`-min_segment_length`** filters are stored in directories located in **`-out_path`**.
4625

47-
PSSM filenames should be based on the identifiers in the fasta file (only alphanumeric characters and '_'). For example if a fasta sequence is as follows:
48-
```
49-
>GCN4_YEAST RecName: Full=General control protein GCN4; AltName: Full=Amino acid biosynthesis regulatory protein
50-
MSEYQPSLFALNPMGFSPLD....
51-
```
52-
PSSM file should be named **`GCN4_YEAST.pssm`**.
53-
54-
You can generate PSSM files with the following command (requires NR90 database):
55-
```bash
56-
psiblast -query GCN4_YEAST.fasta -db NR90_LOCATION -evalue 0.001 -num_iterations 3 -out_ascii_pssm GCN4_YEAST.pssm
57-
```
58-
In order to generate PSSM file from multiple sequence alignment (MSA) you can use this command:
59-
```bash
60-
psiblast -subject sequence.fasta -in_msa alignment.fasta -out_ascii_pssm output.pssm
26+
##### Running DeepCoil within script:
27+
28+
```python
29+
from deepcoil import DeepCoil
30+
from deepcoil.utils import plot_preds
31+
from Bio import SeqIO
32+
33+
dc = DeepCoil(use_gpu=True)
34+
35+
inp = {str(entry.id): str(entry.seq) for entry in SeqIO.parse('example/example.fas', 'fasta')}
36+
37+
results = dc.predict(inp)
38+
39+
plot_preds(results['3WPA_1'], to_file='example/example.png')
6140
```
41+
###### Example graphical output:
42+
![Example](example/example.png)

example/example.fasta

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
>3WPA_1
2+
MKQIEDKIEEILSKIYHIENEIARIKKLIKAVGNQVVTTQTTLVNSLGGNAKVNADGTITGPTYNVAQGNQTNVGDALTALDNAINTAATTSKSTVSNGQNIVVSKSKNADGSDNYEVSTAKDLTVDSVKAGDTVLNNAGITIGNNAVVLNNTGLTISGGPSVTLAGIDAGNKTIQNVANAVNATDAVNKGQLDSAINNVNNNVNELANNAVKYDDASKDKITLGGGATGTTITNVKDGTVAQGSKDAVNGGQLWNVQQQVDQNTTDISNIKNDINNGTVGLVQQAGKDAPVTVAKDTGGTTVNVAGTDGNRVVTGVKEGAVNATSKDAVNGSQLNTTNQAVVNYLGGGAGYDNITGSFTAPSYTVGDSKYNNVGGAIDALNQADQALNSKIDNVSNKLDNAFRITNNRIDDVEKKANAGIHHHHHH
3+

example/example.png

196 KB
Loading

0 commit comments

Comments
 (0)