Skip to content

Commit b18fb40

Browse files
authored
Merge pull request #12 from jsingh811/gfcc-optimization
Gfcc optimization
2 parents ca3bd95 + 3e42a99 commit b18fb40

File tree

7 files changed

+159
-67
lines changed

7 files changed

+159
-67
lines changed

README.md

Lines changed: 101 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -8,83 +8,151 @@ This was written using `Python 3.7.6`, and should work with python 3.6+.
88

99
## Getting Started
1010

11-
Use pip
11+
1. One way to install pyAudioProcessing and it's dependencies is from PyPI using pip
1212
```
1313
pip install pyAudioProcessing
1414
```
15-
Or, you could also clone the project and get it setup
15+
To upgrade to the latest version of pyAudioProcessing, the following pip command can be used.
16+
```
17+
pip install -U pyAudioProcessing
18+
```
19+
20+
2. Or, you could also clone the project and get it setup
1621

1722
```
1823
git clone git@github.com:jsingh811/pyAudioProcessing.git
24+
cd pyAudioProcessing
1925
pip install -e .
2026
```
2127
and then, get the requirements by running
2228

2329
```
2430
pip install -r requirements/requirements.txt
25-
```
31+
```
2632

27-
## Training and Classifying Audio files
33+
## Choices
2834

29-
### Choices
35+
### Feature options
36+
37+
You can choose between features `mfcc`, `gfcc`, `spectral`, `chroma` or any combination of those, example `gfcc,mfcc,spectral,chroma`, to extract from your audio files for classification or just saving extracted feature for other uses.
38+
39+
### Classifier options
3040

31-
Feature options :
32-
You can choose between `mfcc`, `gfcc` or `gfcc,mfcc` features to extract from your audio files.
33-
Classifier options :
3441
You can choose between `svm`, `svm_rbf`, `randomforest`, `logisticregression`, `knn`, `gradientboosting` and `extratrees`.
3542
Hyperparameter tuning is included in the code for each using grid search.
3643

3744

38-
### Examples
45+
## Training and Testing Data structuring
3946

40-
Command line example of using `gfcc` feature and `svm` classifier.
47+
Let's say you have 2 classes that you have training data for (music and speech), and you want to use pyAudioProcessing to train a model using available feature options. Save each class as a directory and all the training audio .wav files under the respective class directories. Example:
4148

42-
Training:
43-
```
44-
python pyAudioProcessing/run_classification.py -f "data_samples/training" -clf "svm" -clfname "svm_clf" -t "train" -feats "gfcc"
49+
```bash
50+
.
51+
├── training_data
52+
├── music
53+
│   ├── music_sample1.wav
54+
│   ├── music_sample2.wav
55+
│   ├── music_sample3.wav
56+
│   ├── music_sample4.wav
57+
├── speech
58+
│   ├── speech_sample1.wav
59+
│   ├── speech_sample2.wav
60+
│   ├── speech_sample3.wav
61+
│   ├── speech_sample4.wav
4562
```
46-
Classifying:
4763

48-
```
49-
python pyAudioProcessing/run_classification.py -f "data_samples/testing" -clf "svm" -clfname "svm_clf" -t "classify" -feats "gfcc"
64+
Similarly, for any test data (with known labels) you want to pass through the classifier, structure it similarly as
65+
66+
```bash
67+
.
68+
├── testing_data
69+
├── music
70+
│   ├── music_sample5.wav
71+
│   ├── music_sample6.wav
72+
├── speech
73+
│   ├── speech_sample5.wav
74+
│   ├── speech_sample6.wav
5075
```
51-
Classification results get saved in `classifier_results.json`.
76+
If you want to classify audio samples without any known labels, structure the data similarly as
77+
78+
```bash
79+
.
80+
├── data
81+
├── unknown
82+
│   ├── sample1.wav
83+
│   ├── sample2.wav
84+
```
85+
86+
## Training and Classifying Audio files
87+
88+
Audio data can be trained, tested and classified using pyAudioProcessing. Please see [feature options](https://github.com/jsingh811/pyAudioProcessing#feature-options) and [classifier model options](https://github.com/jsingh811/pyAudioProcessing#classifier-options) for more information.
5289

90+
### Examples
5391

54-
Code example of using `gfcc` feature and `svm` classifier.
92+
Code example of using `gfcc,spectral,chroma` feature and `svm` classifier. Sample data can be found [here](https://github.com/jsingh811/pyAudioProcessing/tree/master/data_samples). Please refer to the section on [Training and Testing Data structuring](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring) to use your own data instead.
5593
```
5694
from pyAudioProcessing.run_classification import train_and_classify
5795
# Training
58-
train_and_classify("data_samples/training", "train", ["gfcc"], "svm", "svm_clf")
96+
train_and_classify("data_samples/training", "train", ["gfcc", "spectral", "chroma"], "svm", "svm_clf")
97+
```
98+
The above logs files analyzed, hyperparameter tuning results for recall, precision and F1 score, along with the final confusion matrix.
99+
100+
To classify audio samples with the classifier you created above,
101+
```
59102
# Classify data
60-
train_and_classify("data_samples/testing", "classify", ["gfcc"], "svm", "svm_clf")
103+
train_and_classify("data_samples/testing", "classify", ["gfcc", "spectral", "chroma"], "svm", "svm_clf")
104+
```
105+
The above logs the filename where the classification results are saved along with the details about testing files and the classifier used.
106+
107+
108+
If you cloned the project via git, the following command line example of training and classification with `gfcc,spectral,chroma` features and `svm` classifier can be used as well. Sample data can be found [here](https://github.com/jsingh811/pyAudioProcessing/tree/master/data_samples). Please refer to the section on [Training and Testing Data structuring](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring) to use your own data instead.
109+
110+
Training:
61111
```
112+
python pyAudioProcessing/run_classification.py -f "data_samples/training" -clf "svm" -clfname "svm_clf" -t "train" -feats "gfcc,spectral,chroma"
113+
```
114+
Classifying:
62115

63-
## Extracting features from audios
116+
```
117+
python pyAudioProcessing/run_classification.py -f "data_samples/testing" -clf "svm" -clfname "svm_clf" -t "classify" -feats "gfcc,spectral,chroma"
118+
```
119+
Classification results get saved in `classifier_results.json`.
64120

65-
This feature lets the user extract data features calculated on audio files.
66121

67-
### Choices
122+
## Extracting features from audios
68123

69-
Feature options :
70-
You can choose between `mfcc`, `gfcc` or `gfcc,mfcc` features to extract from your audio files.
71-
To use your own audio files for feature extraction, refer to the format of directory `data_samples/testing`.
124+
This feature lets the user extract aggregated data features calculated per audio file. See [feature options](https://github.com/jsingh811/pyAudioProcessing#feature-options) for more information on choices of features available.
72125

73126
### Examples
74127

75-
Command line example of for `gfcc` and `mfcc` feature extractions.
128+
Code example for performing `gfcc` and `mfcc` feature extraction can be found below. To use your own audio data for feature extraction, pass the path to `get_features` in place of `data_samples/testing`. Please refer to the format of directory `data_samples/testing` or the section on [Training and Testing Data structuring](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring).
76129

77-
```
78-
python pyAudioProcessing/extract_features.py -f "data_samples/testing" -feats "gfcc,mfcc"
79-
```
80-
Features extracted get saved in `audio_features.json`.
81-
82-
Code example of performing `gfcc` and `mfcc` feature extraction.
83130
```
84131
from pyAudioProcessing.extract_features import get_features
85132
# Feature extraction
86133
features = get_features("data_samples/testing", ["gfcc", "mfcc"])
134+
# features is a dictionary that will hold data of the following format
135+
"""
136+
{
137+
subdir1_name: {file1_path: {"features": <list>, "feature_names": list}, ...},
138+
subdir2_name: {file1_path: {"features": <list>, "feature_names": list}, ...},
139+
...
140+
}
141+
"""
87142
```
143+
To save features in a json file,
144+
```
145+
from pyAudioProcessing import utils
146+
utils.write_to_json("audio_features.json",features)
147+
```
148+
149+
If you cloned the project via git, the following command line example of for `gfcc` and `mfcc` feature extractions can be used as well. The features argument should be a comma separated string, example `gfcc,mfcc`.
150+
To use your own audio files for feature extraction, pass in the directory path containing .wav files as the `-f` argument. Please refer to the format of directory `data_samples/testing` or the section on [Training and Testing Data structuring](https://github.com/jsingh811/pyAudioProcessing#training-and-testing-data-structuring).
151+
152+
```
153+
python pyAudioProcessing/extract_features.py -f "data_samples/testing" -feats "gfcc,mfcc"
154+
```
155+
Features extracted get saved in `audio_features.json`.
88156

89157

90158
## Author

pyAudioProcessing/extract_features.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
)
2424
PARSER.add_argument(
2525
"-feats", "--feature-names", type=lambda s: [item for item in s.split(",")],
26-
default=["mfcc", "gfcc"],
26+
default=["mfcc", "gfcc", "chroma", "spectral"],
2727
help="Features to compute.",
2828
)
2929

@@ -55,14 +55,15 @@ def get_features(folder_path, feature_names):
5555
False,
5656
feature_names
5757
)
58+
5859
class_file_feats = {}
5960
for inx in range(len(class_names)):
6061
files = file_names[inx]
6162
class_file_feats[class_names[inx]] = {}
6263
for sub_inx in range(len(files)):
6364
class_file_feats[class_names[inx]][files[sub_inx]] = {
6465
"features": list(features[inx][sub_inx]),
65-
"feature_names": feat_names[sub_inx]
66+
"feature_names": feat_names[inx]
6667
}
6768

6869
return class_file_feats

pyAudioProcessing/features/audioFeatureExtraction.py

Lines changed: 36 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -57,30 +57,36 @@ def stFeatureExtraction(signal, fs, win, step, feats):
5757
nFFT = int(win / 2)
5858

5959
[fbank, freqs] = mfccInitFilterBanks(fs, nFFT) # compute the triangular filter banks used in the mfcc calculation
60-
nChroma, nFreqsPerChroma = stChromaFeaturesInit(nFFT, fs)
6160

62-
n_time_spectral_feats = 8
6361
n_harmonic_feats = 0
64-
n_chroma_feats = 13
65-
n_total_feats = n_time_spectral_feats + n_mfcc_feats + n_harmonic_feats + n_chroma_feats +ngfcc
66-
# n_total_feats = n_time_spectral_feats + n_mfcc_feats + n_harmonic_feats
62+
6763
feature_names = []
68-
feature_names.append("zcr")
69-
feature_names.append("energy")
70-
feature_names.append("energy_entropy")
71-
feature_names += ["spectral_centroid", "spectral_spread"]
72-
feature_names.append("spectral_entropy")
73-
feature_names.append("spectral_flux")
74-
feature_names.append("spectral_rolloff")
64+
if "spectral" in feats:
65+
n_time_spectral_feats = 8
66+
feature_names.append("zcr")
67+
feature_names.append("energy")
68+
feature_names.append("energy_entropy")
69+
feature_names += ["spectral_centroid", "spectral_spread"]
70+
feature_names.append("spectral_entropy")
71+
feature_names.append("spectral_flux")
72+
feature_names.append("spectral_rolloff")
73+
else:
74+
n_time_spectral_feats = 0
7575
if "mfcc" in feats:
7676
feature_names += ["mfcc_{0:d}".format(mfcc_i)
7777
for mfcc_i in range(1, n_mfcc_feats+1)]
7878
if "gfcc" in feats:
7979
feature_names += ["gfcc_{0:d}".format(gfcc_i)
8080
for gfcc_i in range(1, ngfcc+1)]
81-
feature_names += ["chroma_{0:d}".format(chroma_i)
82-
for chroma_i in range(1, n_chroma_feats)]
83-
feature_names.append("chroma_std")
81+
if "chroma" in feats:
82+
nChroma, nFreqsPerChroma = stChromaFeaturesInit(nFFT, fs)
83+
n_chroma_feats = 13
84+
feature_names += ["chroma_{0:d}".format(chroma_i)
85+
for chroma_i in range(1, n_chroma_feats)]
86+
feature_names.append("chroma_std")
87+
else:
88+
n_chroma_feats = 0
89+
n_total_feats = n_time_spectral_feats + n_mfcc_feats + n_harmonic_feats + n_chroma_feats +ngfcc
8490
st_features = []
8591
while (cur_p + win - 1 < N):# for each short-term window until the end of signal
8692
count_fr += 1
@@ -92,24 +98,26 @@ def stFeatureExtraction(signal, fs, win, step, feats):
9298
if count_fr == 1:
9399
X_prev = X.copy() # keep previous fft mag (used in spectral flux)
94100
curFV = numpy.zeros((n_total_feats, 1))
95-
curFV[0] = stZCR(x) # zero crossing rate
96-
curFV[1] = stEnergy(x) # short-term energy
97-
curFV[2] = stEnergyEntropy(x) # short-term entropy of energy
98-
[curFV[3], curFV[4]] = stSpectralCentroidAndSpread(X, fs) # spectral centroid and spread
99-
curFV[5] = stSpectralEntropy(X) # spectral entropy
100-
curFV[6] = stSpectralFlux(X, X_prev) # spectral flux
101-
curFV[7] = stSpectralRollOff(X, 0.90, fs) # spectral rolloff
101+
if "spectral" in feats:
102+
curFV[0] = stZCR(x) # zero crossing rate
103+
curFV[1] = stEnergy(x) # short-term energy
104+
curFV[2] = stEnergyEntropy(x) # short-term entropy of energy
105+
[curFV[3], curFV[4]] = stSpectralCentroidAndSpread(X, fs) # spectral centroid and spread
106+
curFV[5] = stSpectralEntropy(X) # spectral entropy
107+
curFV[6] = stSpectralFlux(X, X_prev) # spectral flux
108+
curFV[7] = stSpectralRollOff(X, 0.90, fs) # spectral rolloff
102109
if "mfcc" in feats:
103110
curFV[n_time_spectral_feats:n_time_spectral_feats+n_mfcc_feats, 0] = \
104111
stMFCC(X, fbank, n_mfcc_feats).copy() # MFCCs
105112
if "gfcc" in feats:
106113
curFV[n_time_spectral_feats+n_mfcc_feats:n_time_spectral_feats+n_mfcc_feats+ngfcc, 0] = gfcc.get_gfcc(x)
107-
chromaNames, chromaF = stChromaFeatures(X, fs, nChroma, nFreqsPerChroma)
108-
curFV[n_time_spectral_feats + n_mfcc_feats + ngfcc:
109-
n_time_spectral_feats + n_mfcc_feats + n_chroma_feats + ngfcc - 1] = \
110-
chromaF
111-
curFV[n_time_spectral_feats + n_mfcc_feats + n_chroma_feats + ngfcc - 1] = \
112-
chromaF.std()
114+
if "chroma" in feats:
115+
chromaNames, chromaF = stChromaFeatures(X, fs, nChroma, nFreqsPerChroma)
116+
curFV[n_time_spectral_feats + n_mfcc_feats + ngfcc:
117+
n_time_spectral_feats + n_mfcc_feats + n_chroma_feats + ngfcc - 1] = \
118+
chromaF
119+
curFV[n_time_spectral_feats + n_mfcc_feats + n_chroma_feats + ngfcc - 1] = \
120+
chromaF.std()
113121
st_features.append(curFV)
114122
X_prev = X.copy()
115123

pyAudioProcessing/features/getGfcc.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,17 +41,30 @@ def erb_filter(self):
4141
"""
4242
return filters.make_erb_filters(self.fs, filters.centre_freqs(self.fs, 64, 50))
4343

44-
def get_gfcc(self, signal, ccST=1, ccEND=23):
44+
def mean_var_norm(self, x, std=True):
45+
"""
46+
Returns mean variance normalization.
47+
"""
48+
norm = x - numpy.mean(x, axis=0)
49+
if std is True:
50+
norm = norm / numpy.std(norm)
51+
return norm
52+
53+
def get_gfcc(self, signal, ccST=1, ccEND=23, norm=False):
4554
"""
4655
Get GFCC feature.
4756
"""
4857
erb_filterbank = filters.erb_filterbank(numpy.array(signal), self.erb_filter)
4958
inData = erb_filterbank[10:,:]
59+
inData = numpy.absolute(inData)
60+
inData = numpy.power(inData, 1/3)
5061
[chnNum, frmNum] = numpy.array(inData).shape
5162
mtx = self.dct_matrix(chnNum)
5263
outData = numpy.matmul(mtx, inData)
5364
outData = outData[ccST:ccEND, :]
5465
gfcc_feat = numpy.array(
5566
[numpy.mean(data_list) for data_list in outData]
5667
).copy()
68+
if norm is True:
69+
gfcc_feat = self.mean_var_norm(gfcc_feat)
5770
return gfcc_feat

pyAudioProcessing/run_classification.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
)
2828
PARSER.add_argument(
2929
"-feats", "--feature-names", type=lambda s: [item for item in s.split(",")],
30-
default=["mfcc", "gfcc"],
30+
default=["mfcc", "gfcc", "chroma", "spectral"],
3131
help="Features to compute.",
3232
)
3333
PARSER.add_argument(
@@ -96,6 +96,8 @@ def classify_data(data_dirs, feature_names, classifier, classifier_name):
9696
indx = list(res[1]).index(max(res[1]))
9797
if res[2][indx] == fol.split("/")[-1]:
9898
correctly_classified += 1
99+
if correctly_classified == 0:
100+
print("Either you passed in data with unknown classes, or")
99101
print(
100102
"{} out of {} instances were classified correctly".format(
101103
correctly_classified, num_files

pyAudioProcessing/trainer/audioTrainTest.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -235,7 +235,7 @@ def extract_features(
235235

236236
def featureAndTrain(list_of_dirs, mt_win, mt_step, st_win, st_step,
237237
classifier_type, model_name,
238-
compute_beat=False, perTrain=0.90, feats=["gfcc", "mfcc"]):
238+
compute_beat=False, perTrain=0.90, feats=["gfcc", "mfcc", "spectral", "chroma"]):
239239
'''
240240
This function is used as a wrapper to segment-based audio feature extraction and classifier training.
241241
ARGUMENTS:

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ def get_requirements(path=REQUIREMENTS_PATH):
2323

2424
setuptools.setup(
2525
name='pyAudioProcessing',
26-
version='1.1.5',
26+
version='1.1.6',
2727
description='Audio processing-feature extraction and building machine learning models from audio data.',
2828
long_description=long_description,
2929
long_description_content_type="text/markdown",

0 commit comments

Comments
 (0)