Skip to content

Commit bf3a6f6

Browse files
authored
Merge pull request #34 from jsingh811/paper-update
Update paper
2 parents c91fef0 + 6d695f3 commit bf3a6f6

File tree

5 files changed

+266
-4
lines changed

5 files changed

+266
-4
lines changed

README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,25 @@ python pyAudioProcessing/extract_features.py -f "data_samples/testing" -feats "
185185
```
186186
Features extracted get saved in `audio_features.json`.
187187

188+
## Audio format conversion
189+
190+
You can convert you audio in `.mp4`, `.mp3`, `.m4a` and `.aac` to `.wav`. This will allow you to use audio feature generation and classification functionalities.
191+
192+
In order to convert your audios, the following code sample can be used.
193+
194+
```
195+
from pyAudioProcessing.convert_audio import convert_files_to_wav
196+
197+
# dir_path is the path to the directory/folder on your machine containing audio files
198+
dir_path = "data/mp4_files"
199+
200+
# simple change audio_format to "mp3", "m4a" or "acc" depending on the format
201+
# of audio that you are trying to convert to wav
202+
convert_files_to_wav(dir_path, audio_format="mp4")
203+
204+
# the converted wav files will be saved in the same dir_path location.
205+
206+
```
188207

189208
## Author
190209

paper/paper.bib

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,3 +64,39 @@ @article{giannakopoulos2015pyaudioanalysis
6464
year={2015},
6565
publisher={Public Library of Science}
6666
}
67+
68+
@phdthesis{phdthesis,
69+
author = {Dinger, Vincent},
70+
year = {2021},
71+
month = {03},
72+
pages = {},
73+
title = {Master Thesis KI Methodiken für die Verarbeitung akustischer Signale AI Usage for Processing Acoustic Signals},
74+
doi = {10.13140/RG.2.2.15872.97287}
75+
}
76+
77+
@inbook{packt,
78+
author = {Ben Auffarth},
79+
year = {2020},
80+
month = {10},
81+
title = {Artificial Intelligence with Python Cookbook},
82+
isbn = {9781789133967},
83+
}
84+
85+
@misc{tzanetakis_essl_cook_2001,
86+
author = "Tzanetakis, George and Essl, Georg and Cook, Perry",
87+
title = "Automatic Musical Genre Classification Of Audio Signals",
88+
url = "http://ismir2001.ismir.net/pdf/tzanetakis.pdf",
89+
publisher = "The International Society for Music Information Retrieval",
90+
year = "2001"
91+
}
92+
93+
@misc{nlp,
94+
doi = {10.5281/ZENODO.4915746},
95+
url = {https://zenodo.org/record/4915746},
96+
author = {Singh, Jyotika},
97+
keywords = {YouTube, NER, NLP},
98+
title = {jsingh811/pyYouTubeAnalysis: pyYouTubeAnalysis: YouTube data requests and NER on text},
99+
publisher = {Zenodo},
100+
year = {2021},
101+
copyright = {Open Access}
102+
}

paper/paper.md

Lines changed: 73 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: 'pyAudioProcessing: Audio Processing, Feature Extraction and building Machine Learning Models from Audio Data'
2+
title: 'pyAudioProcessing: Audio Processing, Feature Extraction and Building Machine Learning Models from Audio Data'
33
tags:
44
- Python
55
- audio
@@ -24,17 +24,87 @@ bibliography: paper.bib
2424

2525
# Summary
2626

27-
PyAudioProcessing is a Python based library for processing audio data, forming and extracting numerical features from audio and further building and testing machine learning models. This library allows you to extract features such as MFCC, GFCC, spectral features, chroma features and other beat based and cepstrum based features from audio to use with one's own classification backend or popular scikit-learn classifiers.
27+
PyAudioProcessing is a Python based library for processing audio data, constructing and extracting numerical features from audio, building and testing machine learning models and classifying data with existing pre-trained audio classification models or custom user-built models. PyAudioProcessing provides five core functionalities comprising different stages of audio signal processing.
28+
29+
1. Converting audio files to ".wav" format to give the users the ability to work with different types of audio files and convert them to ".wav" to increase compatibility with code and processes and work with ".wav" audio type.
30+
31+
2. Builds numerical features from audio that can be used to train machine learning models. The set of features supported evolve with time as research informs new and improved algorithms.
32+
33+
3. Includes the ability to export the features built with this library to use with any custom machine learning backend of the user's choosing.
34+
35+
4. Includes the capability that allows users to train scikit-learn classifiers using features of their choosing directly from raw data. This library runs
36+
37+
a. automatic hyper-parameter tuning
38+
b. returns to the user the training model metrics along with cross-validation confusion matrix for model evaluation
39+
c. allows the users to test the created classifier with the same features used for training
40+
41+
5. Includes pre-trained models to provide users with baseline audio classifiers.
42+
43+
It in an end-to-end solution for converting between audio file formats, building features from raw audio samples and training a machine learning model that can then be used to classify unseen raw audio samples. This library allows the user to extract features such as MFCC, GFCC, spectral features, chroma features and other beat based and cepstrum based features from audio to use with one's own classification backend or popular scikit-learn classifiers that have been built into pyAudioProcessing.
44+
45+
MATLAB is the language of choice for a vast amount of research in the audio and speech processing domain. On the contrary, Python remains the language of choice for a vast majority of Machine Learning research and functionality. This library contains features converted to Python that were originally built in MATLAB following a research invention. This software contributes to the available open-source software by enabling users to use Python based machine learning backend with highly researched audio features such as GFCC and others that are actively user for many audio classification based applications but are not readily available in Python due to primary popularity of research in MATLAB.
46+
47+
This software aims to provide machine learning engineers, data scientists, researchers and students with a set of baseline models to classify audio, the ability to use this library to build features on custom training data, the ability to automatically train on a scikit-learn classifier and perform hyper-parameter tuning using this library, the ability to export the built features for integration with any machine learning backend and the ability to classify audio files. This software furthers aims to aid users in addressing research efforts using GFCC and other evolving and actively researched audio features possible with Python.
48+
2849

2950
# Statement of need
3051

52+
The motivation behind this software is understanding the popularity of Python for Machine Learning and presenting solutions for computing complex audio features using Python. This not only implies the need for resource to guide solutions for audio processing, but also signifies the need for Python guides and implementations to solve audio and speech classification tasks. The classifier implementation examples that are a part of this software and the README aim to give the users a sample solution to audio classification problems and help build the foundation to tackle new and unseen problems.
53+
54+
Different data processing techniques work well for different types of data. For example, word vector formations work great for text data [@nlp]. However, passing numbers data, an audio signal or an image through word vector formation is not likely to bring back any meaningful numerical representation that can be used to train machine learning models. Different data types correlate with feature formation techniques specific to their domain rather than a "one size fits all".
55+
3156
PyAudioProcessing is a Python based library for processing audio data into features and building Machine Learning models. Audio processing and feature extraction research is popular in MATLAB. There are comparatively fewer resources for audio processing and classification in Python. This tool contains implementation of popular and different audio feature extraction that can be use in combination with most scikit-learn classifiers. Audio data can be trained, tested and classified using pyAudioProcessing. The output consists of cross validation scores and results of testing on custom audio files.
3257

3358
The library lets the user extract aggregated data features calculated per audio file. Unique feature extractions such as Mel Frequency Cepstral Coefficients (MFCC) [@6921394], Gammatone Frequency Cepstral Coefficients (GFCC) [@inbook], spectral coefficients, chroma features and others are available to extract and use in combination with different backend classifiers. While MFCC features find use in most commonly encountered audio processing tasks such as audio type classification, speech classification, GFCC features have been found to have application in speaker identification or speaker diarization. Many such applications, comparisons and uses can be found in this IEEE paper [@6639061]. All these features are also helpful for a variety of other audio classification tasks.
3459

3560
Some other popular libraries for the domain of audio processing include librosa [@mcfee2015librosa] and pyAudioAnalysis [@giannakopoulos2015pyaudioanalysis]. Librosa is a python package for music and audio analysis. It provides the building blocks necessary to create music information retrieval systems. PyAudioAnalysis is a python library for audio feature extraction, classification, segmentation and applications. It allows the user to train scikit-learn models for mfcc, spectral and chroma features.
3661

37-
PyAudioProcessing adds multiple additional features. The library includes the implementation of GFCC features converted from MATLAB code to allow users to leverage features for speech classification and speaker identification tasks in addition to MFCC and spectral features that are useful for music and other audio classification tasks. It allows the user to choose from the different feature options and use single or combinations of different audio features. The features can be run through a variety of scikit-learn models including a grid search for best model and Hyperparameters, along with a final confusion matrix and cross validation performance statistics. It further allows for saving and exporting the different audio features per audio file for the user to be able to leverage those while using a different custom classifier backend that is not a part of scikit-learn's models.
62+
PyAudioProcessing adds multiple additional features. The library includes the implementation of GFCC features converted from MATLAB based research to allow users to leverage Python with features for speech classification and speaker identification tasks in addition to MFCC and spectral features that are useful for music and other audio classification tasks. It allows the user to choose from the different feature options and use single or combinations of different audio features. The features can be run through a variety of scikit-learn models including a grid search for best model and Hyperparameters, along with a final confusion matrix and cross validation performance statistics. It further allows for saving and exporting the different audio features per audio file for the user to be able to leverage those while using a different custom classifier backend that is not a part of scikit-learn's models.
63+
64+
The library further provides some pre-build audio classification models such as `speechVSmusic`, `speechVSmusicVSbirds` sound classifier and `music genre` classifier for give the users a baseline of pre-trained models for their common audio classification tasks. The user can use the library to build custom classifiers with the help of the instructions in the README.
65+
66+
There is an additional functionality that allows users to convert their audio files to "wav" format to gain compatibility for using analysis and feature extraction on their audio files.
67+
68+
Given the use of this software in the community today inspires the need and growth of this software. It is referenced in a text book titled `Artificial Intelligence with Python Cookbook` published by Packt Publishing in October 2020 [@packt]. Additionally, pyAudioProcessing is a part of specific admissions requirement for a funded PhD project at University of Portsmouth <sup id="portsmouth">[1](#footnote_portsmouth)</sup>. It is further referenced in this thesis paer titled "Master Thesis AI Methodologies for Processing Acoustic Signals AI Usage for Processing Acoustic Signals" [@phdthesis].
69+
70+
<b id="footnote_portsmouth">1</b> https://www.port.ac.uk/study/postgraduate-research/research-degrees/phd/explore-our-projects/detection-of-emotional-states-from-speech-and-text [](#portsmouth)
71+
72+
73+
# Pre-trained models
74+
75+
This software offer pre-trained models. This is an evolving feature as new datasets and classification problems gain prominence in research. Some of the pre-trained models include the following.
76+
77+
1. Audio type classifier to determine speech versus music: Trained SVM classifier for classifying audio into two possible classes - music, speech. This classifier was trained using MFCC, spectral and chroma features. Confusion matrix has scores such as follows.
78+
79+
| | music | speech |
80+
| --- | --- | --- |
81+
| music | 48.80 | 1.20 |
82+
| speech | 0.60 | 49.40 |
83+
84+
2. Audio type classifier to determine speech versus music versus bird sounds: Trained SVM classifier that classifying audio into three possible classes - music, speech and birds. This classifier was trained using MFCC, spectral and chroma features. Confusion matrix has scores such as follows.
85+
86+
| | music | speech | birds |
87+
| --- | --- | --- | --- |
88+
| music | 31.53 | 0.73 | 1.07 |
89+
| speech | 1.00 | 32.33 | 0.00 |
90+
| birds | 0.00 | 0.00 | 33.33 |
91+
92+
3. Music genre classifier using the GTZAN [@tzanetakis_essl_cook_2001] dataset: Trained on SVM classifier using GFCC, MFCC, spectral and chroma features to classify music into 10 genre classes - blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock. Confusion matrix has scores such as follows.
93+
94+
| | pop | met | dis | blu | reg | cla | rock | hip | cou | jazz |
95+
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
96+
| pop | 7.25 | 0.00 | 0.74 | 0.38 | 0.09 | 0.09 | 0.33 | 0.60 | 0.50 | 0.04 |
97+
| met | 0.03 | 8.74 | 0.66 | 0.09 | 0.00 | 0.00 | 0.45 | 0.00 | 0.04 | 0.00 |
98+
| dis | 0.69 | 0.08 | 6.29 | 0.00 | 0.74 | 0.11 | 0.90 | 0.51 | 0.69 | 0.00 |
99+
| blu | 0.00 | 0.20 | 0.00 | 8.31 | 0.25 | 0.08 | 0.44 | 0.09 | 0.30 | 0.34 |
100+
| reg | 0.11 | 0.00 | 0.26 | 0.58 | 7.99 | 0.00 | 0.28 | 0.59 | 0.09 | 0.11 |
101+
| cla | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.07 | 0.23 | 0.00 | 0.23 | 0.48 |
102+
| rock | 0.14 | 0.90 | 1.10 | 0.80 | 0.35 | 0.29 | 5.31 | 0.01 | 1.09 | 0.01 |
103+
| hip | 0.71 | 0.14 | 0.56 | 0.18 | 1.96 | 0.00 | 0.19 | 6.10 | 0.03 | 0.14 |
104+
| cou | 0.25 | 0.15 | 0.84 | 0.64 | 0.08 | 0.10 | 1.87 | 0.00 | 5.84 | 0.24 |
105+
| jazz | 0.04 | 0.01 | 0.13 | 0.41 | 0.00 | 0.76 | 0.31 | 0.00 | 0.53 | 7.81 |
106+
107+
These baseline models aim to present capability of audio feature generation algorithms in extracting meaningful numeric patterns from the audio data. One can train their own classifiers using similar features and different machine learning backend for researching and exploring improvements.
38108

39109
# Audio features
40110

pyAudioProcessing/convert_audio.py

Lines changed: 137 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,137 @@
1+
#!/usr/bin/env python3
2+
# -*- coding: utf-8 -*-
3+
"""
4+
Created on Thu Jun 10 15:23:55 2021
5+
6+
@author: jsingh
7+
"""
8+
# Imports
9+
10+
import os
11+
import glob
12+
13+
from pydub import AudioSegment
14+
15+
16+
# Functions
17+
18+
def convert_from_m4a(file_path):
19+
"""
20+
Converts m4a audio into wav format.
21+
"""
22+
try:
23+
track = AudioSegment.from_file(file_path, "m4a")
24+
file_handle = track.export(
25+
file_path.replace(".m4a", ".wav"), format='wav'
26+
)
27+
except FileNotFoundError:
28+
print("{} does not appear to be valid. Please check.")
29+
except Exception as e:
30+
print(e)
31+
32+
33+
def convert_from_mp3(file_path):
34+
"""
35+
Converts mp3 audio into wav format.
36+
"""
37+
try:
38+
track = AudioSegment.from_file(file_path, "mp3")
39+
file_handle = track.export(
40+
file_path.replace(".mp3", ".wav"), format='wav'
41+
)
42+
except FileNotFoundError:
43+
print("{} does not appear to be valid. Please check.")
44+
except Exception as e:
45+
print(e)
46+
47+
48+
def convert_from_mp4(file_path):
49+
"""
50+
Converts mp4 audio into wav format.
51+
"""
52+
try:
53+
track = AudioSegment.from_file(file_path, "mp4")
54+
file_handle = track.export(
55+
file_path.replace(".mp4", ".wav"), format='wav'
56+
)
57+
except FileNotFoundError:
58+
print("{} does not appear to be valid. Please check.")
59+
except Exception as e:
60+
print(e)
61+
62+
63+
def convert_from_aac(file_path):
64+
"""
65+
Converts aac audio into wav format.
66+
"""
67+
try:
68+
track = AudioSegment.from_file(file_path, "aac")
69+
file_handle = track.export(
70+
file_path.replace(".aac", ".wav"), format='wav'
71+
)
72+
except FileNotFoundError:
73+
print("{} does not appear to be valid. Please check.")
74+
except Exception as e:
75+
print(e)
76+
77+
78+
def convert_files_to_wav(dir_path, audio_format="m4a"):
79+
"""
80+
Converts all the audio files in the input directory path
81+
with the extension specified by audio_format input
82+
into .wav audio files, and saves them in the same directory.
83+
"""
84+
# Read contents of dir
85+
# Only select files with the mentioned extension
86+
files = glob.glob(os.path.join(dir_path, "*." + audio_format))
87+
88+
# Convert to wav
89+
# The wav files save in the same dir as specified by dir_path
90+
if audio_format == "m4a":
91+
cntr = 0
92+
for aud_file in files:
93+
convert_from_m4a(aud_file)
94+
cntr += 1
95+
print(
96+
"{} {} files converted to .wav and saved in {}".format(
97+
cntr, audio_format, dir_path
98+
)
99+
)
100+
elif audio_format == "mp3":
101+
cntr = 0
102+
for aud_file in files:
103+
convert_from_mp3(aud_file)
104+
cntr += 1
105+
print(
106+
"{} {} files converted to .wav and saved in {}".format(
107+
cntr, audio_format, dir_path
108+
)
109+
)
110+
elif audio_format == "aac":
111+
cntr = 0
112+
for aud_file in files:
113+
convert_from_aac(aud_file)
114+
cntr += 1
115+
print(
116+
"{} {} files converted to .wav and saved in {}".format(
117+
cntr, audio_format, dir_path
118+
)
119+
)
120+
elif audio_format == "mp4":
121+
cntr = 0
122+
for aud_file in files:
123+
convert_from_mp4(aud_file)
124+
cntr += 1
125+
print(
126+
"{} {} files converted to .wav and saved in {}".format(
127+
cntr, audio_format, dir_path
128+
)
129+
)
130+
else:
131+
print(
132+
"File format {} is not in supported types (mp3, mp4, m4a, aac)".format(
133+
audio_format
134+
)
135+
)
136+
137+

pyAudioProcessing/run_classification.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ def train_and_classify(
132132
Train on the data under folder_path or classify the data in folder path
133133
using features specified by feature_names and the specified classifier.
134134
"""
135-
# Get all direcotiers under folder_path
135+
# Get all directories under folder_path
136136
data_dirs = [x[0] for x in os.walk(folder_path)][1:]
137137
print(
138138
"\n There are {} classes in the specified data folder\n".format(

0 commit comments

Comments
 (0)