Add further details for using the project

jsingh811 · jsingh811 · commit 9e73fd84b004 · 2021-03-26T15:25:53.000-07:00
diff --git a/README.md b/README.md
@@ -36,14 +36,73 @@ pip install -r requirements/requirements.txt
 
 Feature options :  
 You can choose between features `mfcc`, `gfcc`, `spectral`, `chroma` or a comma separated combination of those, example `gfcc,mfcc,spectral,chroma`, to extract from your audio files.  
+
 Classifier options :  
 You can choose between `svm`, `svm_rbf`, `randomforest`, `logisticregression`, `knn`, `gradientboosting` and `extratrees`.  
+
 Hyperparameter tuning is included in the code for each using grid search.  
 
+### Training and Testing Data structuring  
+
+Let's say you have 2 classes that you have training data for (music and speech), and you want to use pyAudioProcessing to train a model using available feature options. Save each class as a directory and all the training audio .wav files under the respective class directories. Example:  
+
+```bash
+.
+├── training_data
+├── music
+│   ├── music_sample1.wav
+│   ├── music_sample2.wav
+│   ├── music_sample3.wav
+│   ├── music_sample4.wav
+├── speech
+│   ├── speech_sample1.wav
+│   ├── speech_sample2.wav
+│   ├── speech_sample3.wav
+│   ├── speech_sample4.wav
+```  
+
+Similarly, for any test data (with known labels) you want to pass through the classifier, structure it similarly as  
+
+```bash
+.
+├── testing_data
+├── music
+│   ├── music_sample5.wav
+│   ├── music_sample6.wav
+├── speech
+│   ├── speech_sample5.wav
+│   ├── speech_sample6.wav
+```  
+If you want to classify audio samples without any known labels, structure the data similarly as  
+Similarly, for any test data (with known labels) you want to pass through the classifier, structure it as  
+
+```bash
+.
+├── data
+├── unknown
+│   ├── sample1.wav
+│   ├── sample2.wav
+```  
 
 ### Examples  
 
-Command line example of using `gfcc,spectral,chroma` feature and `svm` classifier.   
+Code example of using `gfcc,spectral,chroma` feature and `svm` classifier. Sample data can be found [here](https://github.com/jsingh811/pyAudioProcessing/tree/master/data_samples).   
+```
+from pyAudioProcessing.run_classification import train_and_classify
+# Training
+train_and_classify("data_samples/training", "train", ["gfcc", "spectral", "chroma"], "svm", "svm_clf")
+```
+The above logs files analyzed, hyperparameter tuning results for recall, precision and F1 score, along with the final confusion matrix.
+
+To classify audio samples with the classifier you created above,
+```
+# Classify data
+train_and_classify("data_samples/testing", "classify", ["gfcc", "spectral", "chroma"], "svm", "svm_clf")
+```  
+The above logs the filename where the classification results are saved along with the details about testing files and the classifier used.
+
+
+If you cloned the project via git, the following command line example of doing training and classification with `gfcc,spectral,chroma` features and `svm` classifier can be used as well. Sample data can be found [here](https://github.com/jsingh811/pyAudioProcessing/tree/master/data_samples).  
 
 Training:  
 ```
@@ -57,40 +116,47 @@ python pyAudioProcessing/run_classification.py -f "data_samples/testing" -clf "s
 Classification results get saved in `classifier_results.json`.  
 
 
-Code example of using `gfcc,spectral,chroma` feature and `svm` classifier.  
-```
-from pyAudioProcessing.run_classification import train_and_classify
-# Training
-train_and_classify("data_samples/training", "train", ["gfcc", "spectral", "chroma"], "svm", "svm_clf")
-# Classify data
-train_and_classify("data_samples/testing", "classify", ["gfcc", "spectral", "chroma"], "svm", "svm_clf")
-```
+
 
 ## Extracting features from audios  
 
-This feature lets the user extract data features calculated on audio files.   
+This feature lets the user extract aggregated data features calculated per audio file.   
 
 ### Choices  
 
 Feature options :  
-You can choose between features `mfcc`, `gfcc`, `spectral`, `chroma` or a comma separated combination of those, example `gfcc,mfcc,spectral,chroma`, to extract from your audio files.  
-To use your own audio files for feature extraction and pass in the directory containing .wav files as the `-d` argument. Please refer to the format of directory `data_samples/testing`.  
+You can choose between features `mfcc`, `gfcc`, `spectral`, `chroma` or any combination of those to extract from your audio files.  
 
 ### Examples  
 
-Command line example of for `gfcc` and `mfcc` feature extractions.  
-
-```
-python pyAudioProcessing/extract_features.py -f "data_samples/testing"  -feats "gfcc,mfcc"
-```  
-Features extracted get saved in `audio_features.json`.  
+Code example for performing `gfcc` and `mfcc` feature extraction can be found below. To use your own audio data for feature extraction, pass the path to `get_features` in place of `data_samples/testing`. Please refer to the format of directory `data_samples/testing`.  
 
-Code example of performing `gfcc` and `mfcc` feature extraction.   
 ```
 from pyAudioProcessing.extract_features import get_features
 # Feature extraction
 features = get_features("data_samples/testing", ["gfcc", "mfcc"])
+# features is a dictionary that will hold data of the following format
+"""
+{
+  subdir1_name: {file1_path: {"features": <list>, "feature_names": list}, ...},
+  subdir2_name: {file1_path: {"features": <list>, "feature_names": list}, ...},
+  ...
+}
+"""
+```  
+To save features in a json file,
+```
+from pyAudioProcessing import utils
+utils.write_to_json("audio_features.json",features)
+```  
+
+If you cloned the project via git, the following command line example of for `gfcc` and `mfcc` feature extractions can be used as well. The features argument should be a comma separated string, example `gfcc,mfcc`.  
+To use your own audio files for feature extraction and pass in the directory containing .wav files as the `-d` argument. Please refer to the format of directory `data_samples/testing`.   
+
+```
+python pyAudioProcessing/extract_features.py -f "data_samples/testing"  -feats "gfcc,mfcc"
 ```  
+Features extracted get saved in `audio_features.json`.  
 
 
 ## Author