bshall
diff --git a/‎.gitignore‎
Lines changed: 16 additions & 28 deletions b/‎.gitignore‎
Lines changed: 16 additions & 28 deletions
diff --git a/‎README.md‎
Lines changed: 144 additions & 1 deletion b/‎README.md‎
Lines changed: 144 additions & 1 deletion
diff --git a/‎abx.py‎
Lines changed: 0 additions & 79 deletions b/‎abx.py‎
Lines changed: 0 additions & 79 deletions
diff --git a/‎config.json‎
Lines changed: 0 additions & 29 deletions b/‎config.json‎
Lines changed: 0 additions & 29 deletions
diff --git a/‎config/convert.yaml‎
Lines changed: 10 additions & 0 deletions b/‎config/convert.yaml‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎config/dataset/2019/english.yaml‎
Lines changed: 5 additions & 0 deletions b/‎config/dataset/2019/english.yaml‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎config/dataset/2019/surprise.yaml‎
Lines changed: 5 additions & 0 deletions b/‎config/dataset/2019/surprise.yaml‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎config/encode.yaml‎
Lines changed: 8 additions & 0 deletions b/‎config/encode.yaml‎
Lines changed: 8 additions & 0 deletions
@@ -20,8 +20,6 @@ parts/
 sdist/
 var/
 wheels/
-pip-wheel-metadata/
-share/python-wheels/
 *.egg-info/
 .installed.cfg
 *.egg
@@ -40,14 +38,12 @@ pip-delete-this-directory.txt
 # Unit test / coverage reports
 htmlcov/
 .tox/
-.nox/
 .coverage
 .coverage.*
 .cache
 nosetests.xml
 coverage.xml
 *.cover
-*.py,cover
 .hypothesis/
 .pytest_cache/
 
@@ -59,7 +55,6 @@ coverage.xml
 *.log
 local_settings.py
 db.sqlite3
-db.sqlite3-journal
 
 # Flask stuff:
 instance/
@@ -77,26 +72,11 @@ target/
 # Jupyter Notebook
 .ipynb_checkpoints
 
-# IPython
-profile_default/
-ipython_config.py
-
 # pyenv
 .python-version
 
-# pipenv
-#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-#   However, in case of collaboration, if having platform-specific dependencies or dependencies
-#   having no cross-platform support, pipenv may install dependencies that don't work, or not
-#   install all needed dependencies.
-#Pipfile.lock
-
-# PEP 582; used by e.g. github.com/David-OConnor/pyflow
-__pypackages__/
-
-# Celery stuff
+# celery beat schedule file
 celerybeat-schedule
-celerybeat.pid
 
 # SageMath parsed files
 *.sage.py
@@ -110,7 +90,7 @@ ENV/
 env.bak/
 venv.bak/
 
-# PyCharm project settings
+# Pycharm project settings
 .idea
 
 # Spyder project settings
@@ -125,11 +105,19 @@ venv.bak/
 
 # mypy
 .mypy_cache/
-.dmypy.json
-dmypy.json
 
-# Pyre type checker
-.pyre/
+# Model checkpoints
+checkpoints/
+*.pt
+
+# Datasets and Preprocessed data
+datasets/
+*.npy
+
+# Submission
+submission/
+*.wav
+submission.zip
 
-# Data files
-.npy
+# Hydra outputs
+outputs/
@@ -1 +1,144 @@
-# ContrastivePredictiveCoding
+# Vector-Quantized Contrastive Predictive Coding
+
+To learn discrete representations of speech for the [ZeroSpeech challenges](https://zerospeech.com/), we propose vector-quantized contrastive predictive coding.
+An encoder maps input speech into a discrete sequence of codes.
+Next, an autoregressive model summarises the latent representation (up until time t) into a context vector.
+Using this context, the model learns to discriminate future frames from negative examples sampled randomly from other utterances.
+Finally, an RNN based vocoder is trained to generate audio from the discretized representation.
+
+<p align="center">
+  <img width="784" height="340" alt="VQ-CPC model summary"
+    src="https://raw.githubusercontent.com/bshall/VectorQuantizedCPC/master/model.png">
+</p>
+
+## Requirements
+
+1.  Ensure you have Python 3 and PyTorch 1.4 or greater.
+
+2.  Install [NVIDIA/apex](https://github.com/NVIDIA/apex) for mixed precision training.
+
+3.  Install pip dependencies:
+    ```
+    pip install requirements.txt
+    ```
+    
+4.  For evaluation install [bootphon/zerospeech2020](https://github.com/bootphon/zerospeech2020).
+
+## Data and Preprocessing
+
+1.  Download and extract the [ZeroSpeech2020 datasets](https://download.zerospeech.com/).
+
+2.  Download the train/test splits [here](https://github.com/bshall/VectorQuantizedCPC/releases/tag/v0.1) 
+    and extract in the root directory of the repo.
+    
+3.  Preprocess audio and extract train/test log-Mel spectrograms:
+    ```
+    python preprocess.py in_dir=/path/to/dataset dataset=[2019/english or 2019/surprise]
+    ```
+    Note: `in_dir` must be the path to the `2019` folder. 
+    For `dataset` choose between `2019/english` or `2019/surprise`.
+    Other datasets will be added in the future.
+    ```
+    e.g. python preprecess.py in_dir=../datasets/2020/2019 dataset=2019/english
+    ```
+    
+## Training
+   
+1.  Train the VQ-CPC model (pretrained weights will be released soon):
+    ```
+    python train_cpc.py checkpoint_dir=path/to/checkpoint_dir dataset=[2019/english or 2019/surprise]
+    ```
+    ```
+    e.g. python train_cpc.py checkpoint_dir=checkpoints/cpc/2019english dataset=2019/english
+    ```
+    
+2.  Train the vocoder:
+    ```
+    python train_vocoder.py cpc_checkpoint=path/to/cpc/checkpoint checkpoint_dir=path/to/checkpoint_dir dataset=[2019/english or 2019/surprise]
+    ```
+    ```
+    e.g. python train_vocoder.py cpc_checkpoint=checkpoints/cpc/english2019/model.ckpt-24000.pt checkpoint_dir=checkpoints/vocoder/english2019
+    ```
+
+## Evaluation
+    
+### Voice conversion
+
+```
+python convert.py cpc_checkpoint=path/to/cpc/checkpoint vocoder_checkpoint=path/to/vocoder/checkpoint in_dir=path/to/wavs out_dir=path/to/out_dir synthesis_list=path/to/synthesis_list dataset=[2019/english or 2019/surprise]
+```
+Note: the `synthesis list` is a `json` file:
+```
+[
+    [
+        "english/test/S002_0379088085",
+        "V002",
+        "V002_0379088085"
+    ]
+]
+```
+containing a list of items with a) the path (relative to `in_dir`) of the source `wav` files;
+b) the target speaker (see `datasets/2019/english/speakers.json` for a list of options);
+and c) the target file name.
+```
+e.g. python convert.py cpc_checkpoint=checkpoints/cpc/english2019/model.ckpt-25000.pt vocoder_checkpoint=checkpoints/vocoder/english2019/model.ckpt-150000.pt in_dir=../datasets/2020/2019 out_dir=submission/2019/english/test synthesis_list=datasets/2019/english/synthesis.json in_dir=../../Datasets/2020/2019 dataset=2019/english
+```
+Voice conversion samples will be available soon.
+
+### ABX Score
+    
+1.  Encode test data for evaluation:
+    ```
+    python encode.py checkpoint=path/to/checkpoint out_dir=path/to/out_dir dataset=[2019/english or 2019/surprise]
+    ```
+    ```
+    e.g. python encode.py checkpoint=checkpoints/2019english/model.ckpt-500000.pt out_dir=submission/2019/english/test dataset=2019/english
+    ```
+    
+2. Run ABX evaluation script (see [bootphon/zerospeech2020](https://github.com/bootphon/zerospeech2020)).
+
+The ABX score for the pretrained english model is:
+```
+{
+    "2019": {
+        "english": {
+            "scores": {
+                "abx": 13.444869807551896,
+                "bitrate": 421.3347459545065
+            },
+            "details_bitrate": {
+                "test": 421.3347459545065,
+                "auxiliary_embedding1": 817.3706731019037,
+                "auxiliary_embedding2": 817.6857350383482
+            },
+            "details_abx": {
+                "test": {
+                    "cosine": 13.444869807551896,
+                    "KL": 50.0,
+                    "levenshtein": 27.836903478166363
+                },
+                "auxiliary_embedding1": {
+                    "cosine": 12.47147337307366,
+                    "KL": 50.0,
+                    "levenshtein": 43.91132599798928
+                },
+                "auxiliary_embedding2": {
+                    "cosine": 12.29162067184495,
+                    "KL": 50.0,
+                    "levenshtein": 44.29540315886812
+                }
+            }
+        }
+    }
+}
+```
+
+## References
+
+This work is based on:
+
+1.  Aaron van den Oord, Yazhe Li, and Oriol Vinyals. ["Representation learning with contrastive predictive coding."](https://arxiv.org/abs/1807.03748)
+    arXiv preprint arXiv:1807.03748 (2018).
+
+2.  Aaron van den Oord, and Oriol Vinyals. ["Neural discrete representation learning."](https://arxiv.org/abs/1711.00937)
+    Advances in Neural Information Processing Systems. 2017.
@@ -0,0 +1,10 @@
+defaults:
+    - dataset: 2019/english
+    - preprocessing: default
+    - model: default
+
+synthesis_list: ???
+in_dir: ???
+out_dir: ???
+cpc_checkpoint: ???
+vocoder_checkpoint: ???
@@ -0,0 +1,5 @@
+dataset:
+  dataset: 2019
+  language: english
+  path: 2019/english
+  n_speakers: 102
@@ -0,0 +1,5 @@
+dataset:
+  dataset: 2019
+  language: surprise
+  path: 2019/surprise
+  n_speakers: 113
@@ -0,0 +1,8 @@
+defaults:
+    - dataset: 2019/english
+    - preprocessing: default
+    - model: default
+
+checkpoint: ???
+out_dir: ???
+save_auxiliary: False