Skip to content

Commit 9e37d35

Browse files
authored
Merge pull request #33 from jsingh811/paper-scope
Add paper scope
2 parents 97ff2f0 + 5398ef4 commit 9e37d35

File tree

2 files changed

+40
-9
lines changed

2 files changed

+40
-9
lines changed

paper/paper.bib

Lines changed: 36 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,39 @@ @misc{opensource
2828
url = {https://opensource.com/article/19/9/audio-processing-machine-learning-python}
2929
}
3030
31-
@INPROCEEDINGS{6921394,
32-
author={Chauhan, Paresh M. and Desai, Nikita P.},
33-
booktitle={2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE)},
34-
title={Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter},
35-
year={2014},
36-
volume={},
37-
number={},
38-
pages={1-5},
39-
doi={10.1109/ICGCCEE.2014.6921394}}
31+
@INPROCEEDINGS{6921394,
32+
33+
author={Chauhan, Paresh M. and Desai, Nikita P.},
34+
35+
booktitle={2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE)},
36+
37+
title={Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter},
38+
39+
year={2014},
40+
41+
volume={},
42+
43+
number={},
44+
45+
pages={1-5},
46+
47+
doi={10.1109/ICGCCEE.2014.6921394}}
48+
49+
@inproceedings{mcfee2015librosa,
50+
title={librosa: Audio and music signal analysis in python},
51+
author={McFee, Brian and Raffel, Colin and Liang, Dawen and Ellis, Daniel PW and McVicar, Matt and Battenberg, Eric and Nieto, Oriol},
52+
booktitle={Proceedings of the 14th python in science conference},
53+
volume={8},
54+
year={2015},
55+
doi={10.5281/zenodo.4792298}
56+
}
57+
58+
@article{giannakopoulos2015pyaudioanalysis,
59+
title={pyAudioAnalysis: An Open-Source Python Library for Audio Signal Analysis},
60+
author={Giannakopoulos, Theodoros},
61+
journal={PloS one},
62+
volume={10},
63+
number={12},
64+
year={2015},
65+
publisher={Public Library of Science}
66+
}

paper/paper.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,10 @@ PyAudioProcessing is a Python based library for processing audio data into featu
3232

3333
The library lets the user extract aggregated data features calculated per audio file. Unique feature extractions such as Mel Frequency Cepstral Coefficients (MFCC) [@6921394], Gammatone Frequency Cepstral Coefficients (GFCC) [@inbook], spectral coefficients, chroma features and others are available to extract and use in combination with different backend classifiers. While MFCC features find use in most commonly encountered audio processing tasks such as audio type classification, speech classification, GFCC features have been found to have application in speaker identification or speaker diarization. Many such applications, comparisons and uses can be found in this IEEE paper [@6639061]. All these features are also helpful for a variety of other audio classification tasks.
3434

35+
Some other popular libraries for the domain of audio processing include librosa [@mcfee2015librosa] and pyAudioAnalysis [@giannakopoulos2015pyaudioanalysis]. Librosa is a python package for music and audio analysis. It provides the building blocks necessary to create music information retrieval systems. PyAudioAnalysis is a python library for audio feature extraction, classification, segmentation and applications. It allows the user to train scikit-learn models for mfcc, spectral and chroma features.
36+
37+
PyAudioProcessing adds multiple additional features. The library includes the implementation of GFCC features converted from MATLAB code to allow users to leverage features for speech classification and speaker identification tasks in addition to MFCC and spectral features that are useful for music and other audio classification tasks. It allows the user to choose from the different feature options and use single or combinations of different audio features. The features can be run through a variety of scikit-learn models including a grid search for best model and Hyperparameters, along with a final confusion matrix and cross validation performance statistics. It further allows for saving and exporting the different audio features per audio file for the user to be able to leverage those while using a different custom classifier backend that is not a part of scikit-learn's models.
38+
3539
# Audio features
3640

3741
Information about getting started with audio processing is described in [@opensource].

0 commit comments

Comments
 (0)