Skip to content

Commit 535c954

Browse files
committed
Initial commit
1 parent 81338b6 commit 535c954

31 files changed

+1029
-23785
lines changed

.gitignore

Lines changed: 16 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,6 @@ parts/
2020
sdist/
2121
var/
2222
wheels/
23-
pip-wheel-metadata/
24-
share/python-wheels/
2523
*.egg-info/
2624
.installed.cfg
2725
*.egg
@@ -40,14 +38,12 @@ pip-delete-this-directory.txt
4038
# Unit test / coverage reports
4139
htmlcov/
4240
.tox/
43-
.nox/
4441
.coverage
4542
.coverage.*
4643
.cache
4744
nosetests.xml
4845
coverage.xml
4946
*.cover
50-
*.py,cover
5147
.hypothesis/
5248
.pytest_cache/
5349

@@ -59,7 +55,6 @@ coverage.xml
5955
*.log
6056
local_settings.py
6157
db.sqlite3
62-
db.sqlite3-journal
6358

6459
# Flask stuff:
6560
instance/
@@ -77,26 +72,11 @@ target/
7772
# Jupyter Notebook
7873
.ipynb_checkpoints
7974

80-
# IPython
81-
profile_default/
82-
ipython_config.py
83-
8475
# pyenv
8576
.python-version
8677

87-
# pipenv
88-
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89-
# However, in case of collaboration, if having platform-specific dependencies or dependencies
90-
# having no cross-platform support, pipenv may install dependencies that don't work, or not
91-
# install all needed dependencies.
92-
#Pipfile.lock
93-
94-
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
95-
__pypackages__/
96-
97-
# Celery stuff
78+
# celery beat schedule file
9879
celerybeat-schedule
99-
celerybeat.pid
10080

10181
# SageMath parsed files
10282
*.sage.py
@@ -110,7 +90,7 @@ ENV/
11090
env.bak/
11191
venv.bak/
11292

113-
# PyCharm project settings
93+
# Pycharm project settings
11494
.idea
11595

11696
# Spyder project settings
@@ -125,11 +105,19 @@ venv.bak/
125105

126106
# mypy
127107
.mypy_cache/
128-
.dmypy.json
129-
dmypy.json
130108

131-
# Pyre type checker
132-
.pyre/
109+
# Model checkpoints
110+
checkpoints/
111+
*.pt
112+
113+
# Datasets and Preprocessed data
114+
datasets/
115+
*.npy
116+
117+
# Submission
118+
submission/
119+
*.wav
120+
submission.zip
133121

134-
# Data files
135-
.npy
122+
# Hydra outputs
123+
outputs/

README.md

Lines changed: 144 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,144 @@
1-
# ContrastivePredictiveCoding
1+
# Vector-Quantized Contrastive Predictive Coding
2+
3+
To learn discrete representations of speech for the [ZeroSpeech challenges](https://zerospeech.com/), we propose vector-quantized contrastive predictive coding.
4+
An encoder maps input speech into a discrete sequence of codes.
5+
Next, an autoregressive model summarises the latent representation (up until time t) into a context vector.
6+
Using this context, the model learns to discriminate future frames from negative examples sampled randomly from other utterances.
7+
Finally, an RNN based vocoder is trained to generate audio from the discretized representation.
8+
9+
<p align="center">
10+
<img width="784" height="340" alt="VQ-CPC model summary"
11+
src="https://raw.githubusercontent.com/bshall/VectorQuantizedCPC/master/model.png">
12+
</p>
13+
14+
## Requirements
15+
16+
1. Ensure you have Python 3 and PyTorch 1.4 or greater.
17+
18+
2. Install [NVIDIA/apex](https://github.com/NVIDIA/apex) for mixed precision training.
19+
20+
3. Install pip dependencies:
21+
```
22+
pip install requirements.txt
23+
```
24+
25+
4. For evaluation install [bootphon/zerospeech2020](https://github.com/bootphon/zerospeech2020).
26+
27+
## Data and Preprocessing
28+
29+
1. Download and extract the [ZeroSpeech2020 datasets](https://download.zerospeech.com/).
30+
31+
2. Download the train/test splits [here](https://github.com/bshall/VectorQuantizedCPC/releases/tag/v0.1)
32+
and extract in the root directory of the repo.
33+
34+
3. Preprocess audio and extract train/test log-Mel spectrograms:
35+
```
36+
python preprocess.py in_dir=/path/to/dataset dataset=[2019/english or 2019/surprise]
37+
```
38+
Note: `in_dir` must be the path to the `2019` folder.
39+
For `dataset` choose between `2019/english` or `2019/surprise`.
40+
Other datasets will be added in the future.
41+
```
42+
e.g. python preprecess.py in_dir=../datasets/2020/2019 dataset=2019/english
43+
```
44+
45+
## Training
46+
47+
1. Train the VQ-CPC model (pretrained weights will be released soon):
48+
```
49+
python train_cpc.py checkpoint_dir=path/to/checkpoint_dir dataset=[2019/english or 2019/surprise]
50+
```
51+
```
52+
e.g. python train_cpc.py checkpoint_dir=checkpoints/cpc/2019english dataset=2019/english
53+
```
54+
55+
2. Train the vocoder:
56+
```
57+
python train_vocoder.py cpc_checkpoint=path/to/cpc/checkpoint checkpoint_dir=path/to/checkpoint_dir dataset=[2019/english or 2019/surprise]
58+
```
59+
```
60+
e.g. python train_vocoder.py cpc_checkpoint=checkpoints/cpc/english2019/model.ckpt-24000.pt checkpoint_dir=checkpoints/vocoder/english2019
61+
```
62+
63+
## Evaluation
64+
65+
### Voice conversion
66+
67+
```
68+
python convert.py cpc_checkpoint=path/to/cpc/checkpoint vocoder_checkpoint=path/to/vocoder/checkpoint in_dir=path/to/wavs out_dir=path/to/out_dir synthesis_list=path/to/synthesis_list dataset=[2019/english or 2019/surprise]
69+
```
70+
Note: the `synthesis list` is a `json` file:
71+
```
72+
[
73+
[
74+
"english/test/S002_0379088085",
75+
"V002",
76+
"V002_0379088085"
77+
]
78+
]
79+
```
80+
containing a list of items with a) the path (relative to `in_dir`) of the source `wav` files;
81+
b) the target speaker (see `datasets/2019/english/speakers.json` for a list of options);
82+
and c) the target file name.
83+
```
84+
e.g. python convert.py cpc_checkpoint=checkpoints/cpc/english2019/model.ckpt-25000.pt vocoder_checkpoint=checkpoints/vocoder/english2019/model.ckpt-150000.pt in_dir=../datasets/2020/2019 out_dir=submission/2019/english/test synthesis_list=datasets/2019/english/synthesis.json in_dir=../../Datasets/2020/2019 dataset=2019/english
85+
```
86+
Voice conversion samples will be available soon.
87+
88+
### ABX Score
89+
90+
1. Encode test data for evaluation:
91+
```
92+
python encode.py checkpoint=path/to/checkpoint out_dir=path/to/out_dir dataset=[2019/english or 2019/surprise]
93+
```
94+
```
95+
e.g. python encode.py checkpoint=checkpoints/2019english/model.ckpt-500000.pt out_dir=submission/2019/english/test dataset=2019/english
96+
```
97+
98+
2. Run ABX evaluation script (see [bootphon/zerospeech2020](https://github.com/bootphon/zerospeech2020)).
99+
100+
The ABX score for the pretrained english model is:
101+
```
102+
{
103+
"2019": {
104+
"english": {
105+
"scores": {
106+
"abx": 13.444869807551896,
107+
"bitrate": 421.3347459545065
108+
},
109+
"details_bitrate": {
110+
"test": 421.3347459545065,
111+
"auxiliary_embedding1": 817.3706731019037,
112+
"auxiliary_embedding2": 817.6857350383482
113+
},
114+
"details_abx": {
115+
"test": {
116+
"cosine": 13.444869807551896,
117+
"KL": 50.0,
118+
"levenshtein": 27.836903478166363
119+
},
120+
"auxiliary_embedding1": {
121+
"cosine": 12.47147337307366,
122+
"KL": 50.0,
123+
"levenshtein": 43.91132599798928
124+
},
125+
"auxiliary_embedding2": {
126+
"cosine": 12.29162067184495,
127+
"KL": 50.0,
128+
"levenshtein": 44.29540315886812
129+
}
130+
}
131+
}
132+
}
133+
}
134+
```
135+
136+
## References
137+
138+
This work is based on:
139+
140+
1. Aaron van den Oord, Yazhe Li, and Oriol Vinyals. ["Representation learning with contrastive predictive coding."](https://arxiv.org/abs/1807.03748)
141+
arXiv preprint arXiv:1807.03748 (2018).
142+
143+
2. Aaron van den Oord, and Oriol Vinyals. ["Neural discrete representation learning."](https://arxiv.org/abs/1711.00937)
144+
Advances in Neural Information Processing Systems. 2017.

abx.py

Lines changed: 0 additions & 79 deletions
This file was deleted.

config.json

Lines changed: 0 additions & 29 deletions
This file was deleted.

config/convert.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
defaults:
2+
- dataset: 2019/english
3+
- preprocessing: default
4+
- model: default
5+
6+
synthesis_list: ???
7+
in_dir: ???
8+
out_dir: ???
9+
cpc_checkpoint: ???
10+
vocoder_checkpoint: ???

config/dataset/2019/english.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
dataset:
2+
dataset: 2019
3+
language: english
4+
path: 2019/english
5+
n_speakers: 102

config/dataset/2019/surprise.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
dataset:
2+
dataset: 2019
3+
language: surprise
4+
path: 2019/surprise
5+
n_speakers: 113

config/encode.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
defaults:
2+
- dataset: 2019/english
3+
- preprocessing: default
4+
- model: default
5+
6+
checkpoint: ???
7+
out_dir: ???
8+
save_auxiliary: False

0 commit comments

Comments
 (0)