Merge pull request #12 from jsingh811/gfcc-optimization

jsingh811 · web-flow · commit b18fb40e5287 · 2021-03-26T15:53:12.000-07:00
Gfcc optimization
diff --git a/README.md b/README.md
@@ -8,83 +8,151 @@ This was written using `Python 3.7.6`, and should work with python 3.6+.
 
 ## Getting Started  
 
-Use pip  
+1. One way to install pyAudioProcessing and it's dependencies is from PyPI using pip
 ```
 pip install pyAudioProcessing
 ```  
-Or, you could also clone the project and get it setup  
+To upgrade to the latest version of pyAudioProcessing, the following pip command can be used.  
+```
+pip install -U pyAudioProcessing
+```  
+
+2. Or, you could also clone the project and get it setup  
 
 ```
 git clone git@github.com:jsingh811/pyAudioProcessing.git
+cd pyAudioProcessing
 pip install -e .
 ```
 and then, get the requirements by running
 
 ```
 pip install -r requirements/requirements.txt
-```
+```  
 
-## Training and Classifying Audio files  
+## Choices  
 
-### Choices  
+### Feature options  
+
+You can choose between features `mfcc`, `gfcc`, `spectral`, `chroma` or any combination of those, example `gfcc,mfcc,spectral,chroma`, to extract from your audio files for classification or just saving extracted feature for other uses.  
+
+### Classifier options   
 
-Feature options :  
-You can choose between `mfcc`, `gfcc` or `gfcc,mfcc` features to extract from your audio files.  
-Classifier options :  
 You can choose between `svm`, `svm_rbf`, `randomforest`, `logisticregression`, `knn`, `gradientboosting` and `extratrees`.  
 Hyperparameter tuning is included in the code for each using grid search.  
 
 
-### Examples  
+## Training and Testing Data structuring  
 
-Command line example of using `gfcc` feature and `svm` classifier.   
+Let's say you have 2 classes that you have training data for (music and speech), and you want to use pyAudioProcessing to train a model using available feature options. Save each class as a directory and all the training audio .wav files under the respective class directories. Example:  
 
-Training:  
-```
-python pyAudioProcessing/run_classification.py -f "data_samples/training" -clf "svm" -clfname "svm_clf" -t "train" -feats "gfcc"
+```bash
+.
+├── training_data
+├── music
+│   ├── music_sample1.wav
+│   ├── music_sample2.wav
+│   ├── music_sample3.wav
+│   ├── music_sample4.wav
+├── speech
+│   ├── speech_sample1.wav
+│   ├── speech_sample2.wav
+│   ├── speech_sample3.wav
+│   ├── speech_sample4.wav
 ```  
-Classifying:   
 
-```
-python pyAudioProcessing/run_classification.py -f "data_samples/testing" -clf "svm" -clfname "svm_clf" -t "classify" -feats "gfcc"
+Similarly, for any test data (with known labels) you want to pass through the classifier, structure it similarly as  
+
+```bash
+.
+├── testing_data
+├── music
+│   ├── music_sample5.wav
+│   ├── music_sample6.wav
+├── speech
+│   ├── speech_sample5.wav
+│   ├── speech_sample6.wav
 ```  
-Classification results get saved in `classifier_results.json`.  
+If you want to classify audio samples without any known labels, structure the data similarly as  
+
+```bash
+.
+├── data
+├── unknown
+│   ├── sample1.wav
+│   ├── sample2.wav
+```  
+
+## Training and Classifying Audio files  
+
+Audio data can be trained, tested and classified using pyAudioProcessing. Please see [feature options](https://github.com/jsingh811/pyAudioProcessing#feature-options) and [classifier model options](https://github.com/jsingh811/pyAudioProcessing#classifier-options) for more information.   
 
+### Examples  
 
-Code example of using `gfcc` feature and `svm` classifier.  
+Code example of using `gfcc,spectral,chroma` feature and `svm` classifier. Sample data can be found [here](https://github.com/jsingh811/pyAudioProcessing/tree/master/data_samples). Please refer to the section on [Training and Testing Data structuring](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring) to use your own data instead.   
 ```
 from pyAudioProcessing.run_classification import train_and_classify
 # Training
-train_and_classify("data_samples/training", "train", ["gfcc"], "svm", "svm_clf")
+train_and_classify("data_samples/training", "train", ["gfcc", "spectral", "chroma"], "svm", "svm_clf")
+```
+The above logs files analyzed, hyperparameter tuning results for recall, precision and F1 score, along with the final confusion matrix.
+
+To classify audio samples with the classifier you created above,
+```
 # Classify data
-train_and_classify("data_samples/testing", "classify", ["gfcc"], "svm", "svm_clf")
+train_and_classify("data_samples/testing", "classify", ["gfcc", "spectral", "chroma"], "svm", "svm_clf")
+```  
+The above logs the filename where the classification results are saved along with the details about testing files and the classifier used.
+
+
+If you cloned the project via git, the following command line example of training and classification with `gfcc,spectral,chroma` features and `svm` classifier can be used as well. Sample data can be found [here](https://github.com/jsingh811/pyAudioProcessing/tree/master/data_samples). Please refer to the section on [Training and Testing Data structuring](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring) to use your own data instead.   
+
+Training:  
 ```
+python pyAudioProcessing/run_classification.py -f "data_samples/training" -clf "svm" -clfname "svm_clf" -t "train" -feats "gfcc,spectral,chroma"
+```  
+Classifying:   
 
-## Extracting features from audios  
+```
+python pyAudioProcessing/run_classification.py -f "data_samples/testing" -clf "svm" -clfname "svm_clf" -t "classify" -feats "gfcc,spectral,chroma"
+```  
+Classification results get saved in `classifier_results.json`.  
 
-This feature lets the user extract data features calculated on audio files.   
 
-### Choices  
+## Extracting features from audios  
 
-Feature options :  
-You can choose between `mfcc`, `gfcc` or `gfcc,mfcc` features to extract from your audio files.  
-To use your own audio files for feature extraction, refer to the format of directory `data_samples/testing`.  
+This feature lets the user extract aggregated data features calculated per audio file. See [feature options](https://github.com/jsingh811/pyAudioProcessing#feature-options) for more information on choices of features available.  
 
 ### Examples  
 
-Command line example of for `gfcc` and `mfcc` feature extractions.  
+Code example for performing `gfcc` and `mfcc` feature extraction can be found below. To use your own audio data for feature extraction, pass the path to `get_features` in place of `data_samples/testing`. Please refer to the format of directory `data_samples/testing` or the section on [Training and Testing Data structuring](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring).  
 
-```
-python pyAudioProcessing/extract_features.py -f "data_samples/testing"  -feats "gfcc,mfcc"
-```  
-Features extracted get saved in `audio_features.json`.  
-
-Code example of performing `gfcc` and `mfcc` feature extraction.   
 ```
 from pyAudioProcessing.extract_features import get_features
 # Feature extraction
 features = get_features("data_samples/testing", ["gfcc", "mfcc"])
+# features is a dictionary that will hold data of the following format
+"""
+{
+  subdir1_name: {file1_path: {"features": <list>, "feature_names": list}, ...},
+  subdir2_name: {file1_path: {"features": <list>, "feature_names": list}, ...},
+  ...
+}
+"""
 ```  
+To save features in a json file,
+```
+from pyAudioProcessing import utils
+utils.write_to_json("audio_features.json",features)
+```  
+
+If you cloned the project via git, the following command line example of for `gfcc` and `mfcc` feature extractions can be used as well. The features argument should be a comma separated string, example `gfcc,mfcc`.  
+To use your own audio files for feature extraction, pass in the directory path containing .wav files as the `-f` argument. Please refer to the format of directory `data_samples/testing` or the section on [Training and Testing Data structuring](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring).  
+
+```
+python pyAudioProcessing/extract_features.py -f "data_samples/testing"  -feats "gfcc,mfcc"
+```  
+Features extracted get saved in `audio_features.json`.  
 
 
 ## Author  
diff --git a/pyAudioProcessing/extract_features.py b/pyAudioProcessing/extract_features.py
@@ -23,7 +23,7 @@
 )
 PARSER.add_argument(
     "-feats", "--feature-names", type=lambda s: [item for item in s.split(",")],
-    default=["mfcc", "gfcc"],
+    default=["mfcc", "gfcc", "chroma", "spectral"],
     help="Features to compute.",
 )
 
@@ -55,14 +55,15 @@ def get_features(folder_path, feature_names):
         False,
         feature_names
     )
+
     class_file_feats = {}
     for inx in range(len(class_names)):
         files = file_names[inx]
         class_file_feats[class_names[inx]] = {}
         for sub_inx in range(len(files)):
             class_file_feats[class_names[inx]][files[sub_inx]] = {
                 "features": list(features[inx][sub_inx]),
-                "feature_names": feat_names[sub_inx]
+                "feature_names": feat_names[inx]
             }
 
     return class_file_feats
diff --git a/pyAudioProcessing/features/audioFeatureExtraction.py b/pyAudioProcessing/features/audioFeatureExtraction.py
@@ -57,30 +57,36 @@ def stFeatureExtraction(signal, fs, win, step, feats):
     nFFT = int(win / 2)
 
     [fbank, freqs] = mfccInitFilterBanks(fs, nFFT)                # compute the triangular filter banks used in the mfcc calculation
-    nChroma, nFreqsPerChroma = stChromaFeaturesInit(nFFT, fs)
 
-    n_time_spectral_feats = 8
     n_harmonic_feats = 0
-    n_chroma_feats = 13
-    n_total_feats = n_time_spectral_feats + n_mfcc_feats + n_harmonic_feats + n_chroma_feats +ngfcc
-#    n_total_feats = n_time_spectral_feats + n_mfcc_feats + n_harmonic_feats
+
     feature_names = []
-    feature_names.append("zcr")
-    feature_names.append("energy")
-    feature_names.append("energy_entropy")
-    feature_names += ["spectral_centroid", "spectral_spread"]
-    feature_names.append("spectral_entropy")
-    feature_names.append("spectral_flux")
-    feature_names.append("spectral_rolloff")
+    if "spectral" in feats:
+        n_time_spectral_feats = 8
+        feature_names.append("zcr")
+        feature_names.append("energy")
+        feature_names.append("energy_entropy")
+        feature_names += ["spectral_centroid", "spectral_spread"]
+        feature_names.append("spectral_entropy")
+        feature_names.append("spectral_flux")
+        feature_names.append("spectral_rolloff")
+    else:
+        n_time_spectral_feats = 0
     if "mfcc" in feats:
         feature_names += ["mfcc_{0:d}".format(mfcc_i)
                       for mfcc_i in range(1, n_mfcc_feats+1)]
     if "gfcc" in feats:
         feature_names += ["gfcc_{0:d}".format(gfcc_i)
                       for gfcc_i in range(1, ngfcc+1)]
-    feature_names += ["chroma_{0:d}".format(chroma_i)
-                      for chroma_i in range(1, n_chroma_feats)]
-    feature_names.append("chroma_std")
+    if "chroma" in feats:
+        nChroma, nFreqsPerChroma = stChromaFeaturesInit(nFFT, fs)
+        n_chroma_feats = 13
+        feature_names += ["chroma_{0:d}".format(chroma_i)
+                          for chroma_i in range(1, n_chroma_feats)]
+        feature_names.append("chroma_std")
+    else:
+        n_chroma_feats = 0
+    n_total_feats = n_time_spectral_feats + n_mfcc_feats + n_harmonic_feats + n_chroma_feats +ngfcc
     st_features = []
     while (cur_p + win - 1 < N):# for each short-term window until the end of signal
         count_fr += 1
@@ -92,24 +98,26 @@ def stFeatureExtraction(signal, fs, win, step, feats):
         if count_fr == 1:
             X_prev = X.copy() # keep previous fft mag (used in spectral flux)
         curFV = numpy.zeros((n_total_feats, 1))
-        curFV[0] = stZCR(x) # zero crossing rate
-        curFV[1] = stEnergy(x) # short-term energy
-        curFV[2] = stEnergyEntropy(x) # short-term entropy of energy
-        [curFV[3], curFV[4]] = stSpectralCentroidAndSpread(X, fs)    # spectral centroid and spread
-        curFV[5] = stSpectralEntropy(X) # spectral entropy
-        curFV[6] = stSpectralFlux(X, X_prev) # spectral flux
-        curFV[7] = stSpectralRollOff(X, 0.90, fs) # spectral rolloff
+        if "spectral" in feats:
+            curFV[0] = stZCR(x) # zero crossing rate
+            curFV[1] = stEnergy(x) # short-term energy
+            curFV[2] = stEnergyEntropy(x) # short-term entropy of energy
+            [curFV[3], curFV[4]] = stSpectralCentroidAndSpread(X, fs)    # spectral centroid and spread
+            curFV[5] = stSpectralEntropy(X) # spectral entropy
+            curFV[6] = stSpectralFlux(X, X_prev) # spectral flux
+            curFV[7] = stSpectralRollOff(X, 0.90, fs) # spectral rolloff
         if "mfcc" in feats:
             curFV[n_time_spectral_feats:n_time_spectral_feats+n_mfcc_feats, 0] = \
             stMFCC(X, fbank, n_mfcc_feats).copy()    # MFCCs
         if "gfcc" in feats:
             curFV[n_time_spectral_feats+n_mfcc_feats:n_time_spectral_feats+n_mfcc_feats+ngfcc, 0] = gfcc.get_gfcc(x)
-        chromaNames, chromaF = stChromaFeatures(X, fs, nChroma, nFreqsPerChroma)
-        curFV[n_time_spectral_feats + n_mfcc_feats + ngfcc:
-              n_time_spectral_feats + n_mfcc_feats + n_chroma_feats + ngfcc - 1] = \
-            chromaF
-        curFV[n_time_spectral_feats + n_mfcc_feats + n_chroma_feats + ngfcc - 1] = \
-            chromaF.std()
+        if "chroma" in feats:
+            chromaNames, chromaF = stChromaFeatures(X, fs, nChroma, nFreqsPerChroma)
+            curFV[n_time_spectral_feats + n_mfcc_feats + ngfcc:
+                  n_time_spectral_feats + n_mfcc_feats + n_chroma_feats + ngfcc - 1] = \
+                chromaF
+            curFV[n_time_spectral_feats + n_mfcc_feats + n_chroma_feats + ngfcc - 1] = \
+                chromaF.std()
         st_features.append(curFV)
         X_prev = X.copy()
 
diff --git a/pyAudioProcessing/features/getGfcc.py b/pyAudioProcessing/features/getGfcc.py
@@ -41,17 +41,30 @@ def erb_filter(self):
         """
         return filters.make_erb_filters(self.fs, filters.centre_freqs(self.fs, 64, 50))
 
-    def get_gfcc(self, signal, ccST=1, ccEND=23):
+    def mean_var_norm(self, x, std=True):
+        """
+        Returns mean variance normalization.
+        """
+        norm = x - numpy.mean(x, axis=0)
+        if std is True:
+            norm = norm / numpy.std(norm)
+        return norm
+
+    def get_gfcc(self, signal, ccST=1, ccEND=23, norm=False):
         """
         Get GFCC feature.
         """
         erb_filterbank = filters.erb_filterbank(numpy.array(signal), self.erb_filter)
         inData = erb_filterbank[10:,:]
+        inData = numpy.absolute(inData)
+        inData = numpy.power(inData, 1/3)
         [chnNum, frmNum] = numpy.array(inData).shape
         mtx = self.dct_matrix(chnNum)
         outData = numpy.matmul(mtx, inData)
         outData = outData[ccST:ccEND, :]
         gfcc_feat = numpy.array(
             [numpy.mean(data_list) for data_list in outData]
         ).copy()
+        if norm is True:
+            gfcc_feat = self.mean_var_norm(gfcc_feat)
         return gfcc_feat
diff --git a/pyAudioProcessing/run_classification.py b/pyAudioProcessing/run_classification.py
@@ -27,7 +27,7 @@
 )
 PARSER.add_argument(
     "-feats", "--feature-names", type=lambda s: [item for item in s.split(",")],
-    default=["mfcc", "gfcc"],
+    default=["mfcc", "gfcc", "chroma", "spectral"],
     help="Features to compute.",
 )
 PARSER.add_argument(
@@ -96,6 +96,8 @@ def classify_data(data_dirs, feature_names, classifier, classifier_name):
             indx = list(res[1]).index(max(res[1]))
             if res[2][indx] == fol.split("/")[-1]:
                 correctly_classified += 1
+        if correctly_classified == 0:
+            print("Either you passed in data with unknown classes, or")
         print(
             "{} out of {} instances were classified correctly".format(
                 correctly_classified, num_files
diff --git a/pyAudioProcessing/trainer/audioTrainTest.py b/pyAudioProcessing/trainer/audioTrainTest.py
@@ -235,7 +235,7 @@ def extract_features(
 
 def featureAndTrain(list_of_dirs, mt_win, mt_step, st_win, st_step,
                     classifier_type, model_name,
-                    compute_beat=False, perTrain=0.90, feats=["gfcc", "mfcc"]):
+                    compute_beat=False, perTrain=0.90, feats=["gfcc", "mfcc", "spectral", "chroma"]):
     '''
     This function is used as a wrapper to segment-based audio feature extraction and classifier training.
     ARGUMENTS:
diff --git a/setup.py b/setup.py
@@ -23,7 +23,7 @@ def get_requirements(path=REQUIREMENTS_PATH):
 
 setuptools.setup(
    name='pyAudioProcessing',
-   version='1.1.5',
+   version='1.1.6',
    description='Audio processing-feature extraction and building machine learning models from audio data.',
    long_description=long_description,
    long_description_content_type="text/markdown",

Original file line number	Diff line number	Diff line change
`@@ -23,7 +23,7 @@`
`23`	`23`	`)`
`24`	`24`	`PARSER.add_argument(`
`25`	`25`	`"-feats", "--feature-names", type=lambda s: [item for item in s.split(",")],`
`26`		`- default=["mfcc", "gfcc"],`
	`26`	`+ default=["mfcc", "gfcc", "chroma", "spectral"],`
`27`	`27`	`help="Features to compute.",`
`28`	`28`	`)`
`29`	`29`
`@@ -55,14 +55,15 @@ def get_features(folder_path, feature_names):`
`55`	`55`	`False,`
`56`	`56`	`feature_names`
`57`	`57`	`)`
	`58`	`+`
`58`	`59`	`class_file_feats = {}`
`59`	`60`	`for inx in range(len(class_names)):`
`60`	`61`	`files = file_names[inx]`
`61`	`62`	`class_file_feats[class_names[inx]] = {}`
`62`	`63`	`for sub_inx in range(len(files)):`
`63`	`64`	`class_file_feats[class_names[inx]][files[sub_inx]] = {`
`64`	`65`	`"features": list(features[inx][sub_inx]),`
`65`		`- "feature_names": feat_names[sub_inx]`
	`66`	`+ "feature_names": feat_names[inx]`
`66`	`67`	`}`
`67`	`68`
`68`	`69`	`return class_file_feats`