Skip to content

Commit 0931a1e

Browse files
committed
Add paper
1 parent b9341d0 commit 0931a1e

File tree

4 files changed

+73
-0
lines changed

4 files changed

+73
-0
lines changed

gfcc.png

152 KB
Loading

mfcc.png

124 KB
Loading

paper.bib

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
@INPROCEEDINGS{6639061,
2+
author={Zhao, Xiaojia and Wang, DeLiang},
3+
booktitle={2013 IEEE International Conference on Acoustics, Speech and Signal Processing},
4+
title={Analyzing noise robustness of MFCC and GFCC features in speaker identification},
5+
year={2013},
6+
volume={},
7+
number={},
8+
pages={7204-7208},
9+
doi={10.1109/ICASSP.2013.6639061}}
10+
11+
@inbook{inbook,
12+
author = {Jeevan, Medikonda and Dhingra, Atul and Hanmandlu, M. and Panigrahi, Bijaya},
13+
year = {2017},
14+
month = {10},
15+
pages = {85-91},
16+
title = {Robust Speaker Verification Using GFCC Based i-Vectors},
17+
volume = {395},
18+
isbn = {978-81-322-3590-3},
19+
doi = {10.1007/978-81-322-3592-7_9}
20+
}
21+
22+
@misc{opensource,
23+
author = {Jyotika Singh},
24+
title = {An introduction to audio processing and machine learning using Python},
25+
year = {2019},
26+
publisher = {Opensource},
27+
journal = {Opensource article},
28+
url = {https://opensource.com/article/19/9/audio-processing-machine-learning-python}
29+
}

paper.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
---
2+
title: 'pyAudioProcessing: Audio Processing, Feature Extraction and building Machine Learning Models from Audio Data'
3+
tags:
4+
- Python
5+
- audio
6+
- audio processing
7+
- feature extraction
8+
- machine learning
9+
- gfcc
10+
- mfcc
11+
- cepstral coefficients
12+
- spectral coefficients
13+
authors:
14+
- name: Jyotika Singh
15+
orcid: 0000-0002-5442-3004
16+
date: 2 June 2021
17+
bibliography: paper.bib
18+
19+
---
20+
21+
# Summary
22+
23+
PyAudioProcessing is a Python based library for processing audio data, forming and extracting numerical features from audio and further bulding machine learning models. This library allows you to extract features such as MFCC, GFCC, spectral features, chroma features and other beat based and cepstrum based features from audio to use with one's own classification backend or popular scikit-learn classifiers.
24+
25+
# Statement of need
26+
27+
PyAudioProcessing is a Python based library for processing audio data into features and building Machine Learning models. Audio processing and feature extraction research is popular in MATLAB. There are comparatively fewer resources for audio processing and classification in Python. This tool contains implementation of popular and different audio feature extraction that can be use in combination with most scikit-learn classifiers. Unique feature extractions such as Mel Frequency Cepstral Coefficients (MFCC), Gammatone Frequency Cepstral Coefficients (GFCC) [@inbook], spectral coefficients, chroma features and others are available to extract and use in combination with different backend classifiers. While MFCC features find use in most commonly encountered audio processing tasks such as audio type classification, speech classification, GFCC features have been found to have application in speaker identification/diarization. Many such applications, comparisons and uses can be found in this IEEE paper [@6639061].
28+
29+
# Audio features
30+
31+
Information about getting started with audio processing is described in @opensource.
32+
33+
Passing a spectrum through the Mel filter bank, followed by taking the log magnitude and a discrete cosine transform (DCT) produces the Mel cepstrum. DCT extracts the signal's main information and peaks. It is also widely used in JPEG and MPEG compressions. The peaks are the gist of the audio information. Typically, the first 13 coefficients extracted from the Mel cepstrum are called the MFCCs. These hold very useful information about audio and are often used to train machine learning models. This can be further seen in the form of an illustration in \autoref{fig:mfcc}.
34+
35+
Another filter inspired by human hearing is the Gammatone filter bank. This filter bank is used as a front-end simulation of the cochlea. Thus, it has many applications in speech processing because it aims to replicate how we hear. GFCCs are formed by passing the spectrum through Gammatone filter bank, followed by loudness compression and DCT, as seen in \autoref{fig:gfcc}. The first (approximately) 22 features are called GFCCs. GFCCs have a number of applications in speech processing, such as speaker identification.
36+
37+
Other features useful in audio processing tasks (especially speech) include LPCC, BFCC, PNCC, and spectral features like spectral flux, entropy, roll off, centroid, spread, and energy entropy.
38+
39+
40+
![MFCC from audio spectrum.\label{fig:mfcc}](mfcc.png)
41+
42+
![GFCC from audio spectrum.\label{fig:gfcc}](gfcc.png)
43+
44+
# References

0 commit comments

Comments
 (0)