You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+23-10Lines changed: 23 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,18 +2,31 @@
2
2
3
3
This project promulgates a `pipeline` that `trains` end-to-end keyword spotting models using input audio files, `tracks` experiments by logging the model artifacts, parameters and metrics, `build` them as a web application followed by `dockerizing` them into a container and deploys the application containing trained model artifacts as a docker container into the cloud server with `CI/CD` integration.
4
4
5
-
## Keyword Spotter in Heroku - Demo
6
-
![]()
7
-
8
-
_**Disclaimer:** This app is just a demo and not for realtime usage. The main objective is to get ML models into production in terms of deployment and CI/CD, from MLOps paradigm._
_**Link**: Will be updated. Please check the `Disclaimer` below the screenshot for more !!!_
18
+
19
+
||
20
+
|:--:|
21
+
| <b>Figure 1a: App demo - Audio input to app for predicting keyword from trained model artifact</b>|
22
+
23
+
||
24
+
|:--:|
25
+
| <b>Figure 1b: App demo - Predicted keyword with probability</b>|
16
26
27
+
_**Disclaimer:**_ <br>
28
+
_1. This app is just a demo and not for realtime usage. The main objective is to get ML models into production in terms of deployment and CI/CD, from MLOps paradigm_. <br>
29
+
_2. Additionally, due to some technical issues in the Heroku backend, the app currently crashes, so the Heroku app link is not provided as of now. It will be updated once the issues are solved and when the app is up and running_.
17
30
18
31
## Motivation
19
32
@@ -64,7 +77,7 @@ Every application or project comprises of multiple configuration settings. The m
64
77
65
78
Firstly, the audio has to be embedded into the vector space which constitutes the features to learn. To facilitate that, [Mel-Frequency Cepstral Coefficients](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum) (MFCC) is the most common widely used, feature extraction technique for audio data. MFCCs are derived using `Fourier transform` and `log-Mel spectrogram`. More detailed mathematical explanation can be found [here](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum). In order to extract these features, `librosa` is used. [data.py](./src/data.py) contains the code for preprocessing audio and extracting features from them. It reads the audio file, compute MFCC and pad them for fixed-sized vector for all audio files as CNN cannot handle sequential data. In order to avoid any hassles in loading and processing plethora of audio files, it's a good practice to dump them to `.npy` arrays, which makes it easier for further usage.
66
79
67
-
_**Note:** Due to large file size, the training data (.npy) files are uploaded to shared folder. Download it from []() and make sure that, the downloaded files are placed in [this directory](./dataset/train/). [Test directory](./dataset/test/) contains some sample audio files for local inferencing._
80
+
_**Note:** Due to large file size, the training data (.npy) files are uploaded to shared folder. Download it from [here](https://www.dropbox.com/sh/4wjo8e8h4cg4xlo/AAAC3yR_kj5oq-ZcJopBosYYa?dl=0) and make sure that, the downloaded files are placed in [this directory](./dataset/train/). [Test directory](./dataset/test/) contains some sample audio files for local inferencing._
68
81
69
82
### CNN-LSTM Model
70
83
@@ -117,7 +130,7 @@ The aforementioned, same functionality is also implemented in the code as well.
117
130
118
131
Now, the model artifacts are ready and are built into an web API, it's the time for deployment to host this application. To facilitate them a step further, `docker` would be a great tool. [Docker](https://www.docker.com/) eases the developers to package applications or software which can be easily reproduced on another machine. It uses containers to pack any applications with its dependencies to deploy in another environment. Generally, it is not a mandatory tool or step for deployment as it can also be done without dockers but they serve many purpose like portability, scalability, version control, no dependency hassles etc. Thus, docker is a great tool in the deployment cycle.
119
132
120
-
The main idea of using docker in this project is, to package and build a `docker image` from the FLASK application with necessary files and containerize them into a `docker container` which can be deployed in any server (in this case - Heroku cloud server). [Dockerfile](./Dockerfile) contains all the commands needed to build an image. It serves as a bridge in the `CI/CD` pipeline.
133
+
The main idea of using docker in this project is, to package and build a `docker image` from the FLASK application with necessary files and containerize them into a `docker container` which can be deployed in any server (in this case - Heroku cloud server). [Dockerfile](./Dockerfile) contains all the commands needed to build an image. The command to install external packages for any `Debian or Ubuntu` based systems are also added. The `docker`serves as a bridge in the `CI/CD` pipeline between the web app and cloud server.
121
134
122
135
### GitHub Actions
123
136
@@ -129,7 +142,7 @@ The main idea of using docker in this project is, to package and build a `docker
129
142
130
143
[Heroku](https://www.heroku.com/) is a container-based cloud Platform as a Service (PaaS) to deploy, manage, and scale modern apps. It accounts for the CD pipeline. As a result of CI, when docker container is build, CD deploys it into `Heroku` cloud which hosts the application and can be accessed via `URL`. _In layman terms, the application is on the internet, up and running and can be accessed with website or URL_. The command for Heroku is included in the [Dockerfile](./Dockerfile) itself.
131
144
132
-
As a result, the application can be accessed @ [here]()and snapshot of application UI is depicted in the[Demo](#KeywordSpotterinHeroku-Demo) section.
145
+
As a result, the application will be deployed and the snapshot of application UI is depicted in the [Demo](#KeywordSpotterinHeroku-Demo) section.
133
146
134
147
## Run locally
135
148
@@ -151,9 +164,9 @@ Install dependencies
151
164
pip install -r requirements.txt
152
165
```
153
166
154
-
Download `.npy` dataset from [here](). Make sure to put them in [./dataset/train/](./dataset/train/) directory. If not, it is fine to use a different directory but, make sure to specify the valid directory name or path in the [config.yaml](./config_dir/config.yaml) file.
167
+
Download `.npy` dataset from [here](https://www.dropbox.com/sh/4wjo8e8h4cg4xlo/AAAC3yR_kj5oq-ZcJopBosYYa?dl=0). Make sure to put them in [./dataset/train/](./dataset/train/) directory. If not, it is fine to use a different directory but, make sure to specify the valid directory name or path in the [config.yaml](./config_dir/config.yaml) file.
155
168
156
-
Train model
169
+
Train the model
157
170
158
171
```bash
159
172
python3 main.py
@@ -171,7 +184,7 @@ Use audio files from this [test directory](./dataset/test/) for local inferencin
171
184
172
185
_**Note:** Assign necessary parameter variables and path in the [config.yaml](./config_dir/config.yaml). If it throws any error, please ensure that valid `PATH_NAMES` and `parameter` values are used._
173
186
174
-
Additionally, to run via docker container , build image from [Dockerfile](./Dockerfile) and run the container using `docker build` and `docker run` commands. As this is not a docker tutorial, it is not necessary to go more in-depth into dockers.
187
+
Additionally, to run locally via docker container , build image from [Dockerfile](./Dockerfile) and run the container using `docker build` and `docker run` commands. As this is not a docker tutorial, it is not necessary to go more in-depth into dockers.
0 commit comments