Skip to content

Commit 217140e

Browse files
authored
Fix deployment error (#7)
1 parent 0de3819 commit 217140e

File tree

7 files changed

+1146
-426
lines changed

7 files changed

+1146
-426
lines changed

.github/docs/architecture-guide.md

Lines changed: 4 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,9 @@ The following components are used as part of this design:
3838
- [Azure Container Registry](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-intro): managed and private Docker registry service based on the open-source Docker.
3939
- [Azure Data Lake Gen 2](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction): scalable solution optimized for storing massive amounts of unstructured data.
4040
- [Azure Monitor](https://docs.microsoft.com/en-us/azure/azure-monitor/overview): a comprehensive solution for collecting, analyzing, and acting on telemetry from your workloads.
41-
- [MLFlow](https://docs.microsoft.com/en-us/azure/databricks/applications/mlflow): open-source solution integrated within Databricks for managing the end-to-end machine learning life cycle.
41+
- [MLflow](https://docs.microsoft.com/en-us/azure/databricks/applications/mlflow): open-source solution integrated within Databricks for managing the end-to-end machine learning life cycle.
42+
- [Azure API Management](https://docs.microsoft.com/en-us/azure/api-management/api-management-key-concepts): a fully managed service that enables customers to publish, secure, transform, maintain, and monitor APIs.
43+
- [Azure Application Gateway](https://docs.microsoft.com/en-us/azure/application-gateway/overview): a web traffic load balancer that enables you to manage traffic to your web applications.
4244
- [Azure DevOps](https://azure.microsoft.com/solutions/devops/) or [GitHub](https://azure.microsoft.com/products/github/): solutions for implementing DevOps practices to enforce automation and compliance with your workload development and deployment pipelines.
4345

4446
> **NOTE:**
@@ -61,20 +63,11 @@ Before implementing this solution some factors you might want to consider, inclu
6163

6264
All services deployed in this solution use a consumption-based pricing model. The [Azure pricing calculator](https://azure.microsoft.com/pricing/calculator) can be used to estimate costs for a specific scenario. For other considerations, see [Cost Optimization](https://docs.microsoft.com/en-us/azure/architecture/framework/#cost-optimization) in the Well-Architected Framework.
6365

64-
## Deploy this scenario
65-
66-
A proof-of-concept implementation of this scenario is available at the [MLOps Platform using Databricks and Kubernetes](https://github.com/nfmoore/databricks-kubernetes-mlops-poc) repository. This sample illustrates:
67-
68-
- How an MLFlow model can be trained on Databricks.
69-
- How to package models as a web service using open-source tools.
70-
- How to deploy to Kubernetes via CI/CD.
71-
- How to monitor API performance and model data drift.
72-
7366
## Related resources
7467

7568
You may also find these Architecture Center articles useful:
7669

7770
- [Machine Learning Operations maturity model](https://docs.microsoft.com/en-us/azure/architecture/example-scenario/mlops/mlops-maturity-model)
7871
- [Team Data Science Process for data scientists](https://docs.microsoft.com/en-us/azure/architecture/data-science-process/overview)
7972
- [Modern analytics architecture with Azure Databricks](https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/azure-databricks-modern-analytics-architecture)
80-
- [Building A Clinical Data Drift Monitoring System With Azure DevOps, Azure Databricks, And MLflow](https://devblogs.microsoft.com/cse/2020/10/29/building-a-clinical-data-drift-monitoring-system-with-azure-devops-azure-databricks-and-mlflow/)
73+
- [Building A Clinical Data Drift Monitoring System With Azure DevOps, Azure Databricks, And MLflow](https://devblogs.microsoft.com/cse/2020/10/29/building-a-clinical-data-drift-monitoring-system-with-azure-devops-azure-databricks-and-mlflow/)

.github/docs/implementation-guide.md

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99

1010
### 1.1. Create repository
1111

12-
Log in to your GitHub account, navigate to the [databricks-kubernetes-real-time-mlflow-model-deployment-poc](https://github.com/nfmoore/databricks-kubernetes-real-time-mlflow-model-deployment-poc) repository and click `use this template` to create a new repository from this template. Rename the template and leave it public. Use [these](https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-from-a-template) instructions for more details about creating a repository from a template.
12+
Log in to your GitHub account, navigate to the [databricks-kubernetes-mlops-poc](https://github.com/nfmoore/databricks-kubernetes-mlops-poc) repository and click `use this template` to create a new repository from this template. Rename the template and leave it public. Use [these](https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-from-a-template) instructions for more details about creating a repository from a template.
1313

1414
### 1.2. Deploy resources
1515

@@ -25,12 +25,6 @@ To deploy the resources for this proof-of-concept in your Azure environment clic
2525

2626
After the resources have been successfully deployed some services need to be configured before you can train, register, deploy and monitor the machine learning models.
2727

28-
#### Log Analytics Workspace
29-
30-
For the Log Analytics workspace, Azure Monitor for Containers needs to be enabled. To enable this, click on an AKS cluster deployed as part of 1.2 above, click on the Logs tab in the monitoring section, then select your Log Analytics workspace and click enable. This process is shown in the image below. Ensure to repeat this process for the second AKS cluster in your resource group.
31-
32-
![1-2](.github/../images/implementation/1-2.png)
33-
3428
#### Azure Databricks
3529

3630
For Azure Databricks you need to enable the [Files in Repo](https://docs.microsoft.com/en-us/azure/databricks/repos#enable-support-for-arbitrary-files-in-databricks-repos) feature (which is not enabled by default at the time of developing this proof-of-concept), generate a new [Databricks Access Token](https://docs.microsoft.com/en-au/azure/databricks/dev-tools/api/latest/authentication), and create a [cluster with custom libraries](https://docs.microsoft.com/en-au/azure/databricks/libraries/cluster-libraries).
@@ -97,8 +91,8 @@ You need to create the following secrets:
9791

9892
| Secret name | How to find secret value |
9993
|:------------|:-------------------------|
100-
| AZURE_CREDENTIALS | A JSON object with details of your Azure Service Principal. [This](https://github.com/marketplace/actions/azure-login#configure-deployment-credentials) document will help you configure a service principal with a secret. The value will look something like: ` { "clientId": "<GUID>", "clientSecret": "<GUID>", "subscriptionId": "<GUID>", "tenantId": "<GUID>", ... }`|
101-
| DATABRICKS_HOST | This is the `instance name` or `per-workspace URL` of your Azure Databricks service. Its value can be found from the Databricks service page on the Azure Portal under the `URL` parameter. For more information [this]( https://docs.microsoft.com/en-us/azure/databricks/workspace/workspace-details#per-workspace-url) resource can be used. The value will look something like ` https://adb-5555555555555555.19.azuredatabricks.net`|
94+
| AZURE_CREDENTIALS | A JSON object with details of your Azure Service Principal. [This](https://github.com/marketplace/actions/azure-login#configure-deployment-credentials) document will help you configure a service principal with a secret. The value will look something like: `{ "clientId": "<GUID>", "clientSecret": "<GUID>", "subscriptionId": "<GUID>", "tenantId": "<GUID>", ... }`|
95+
| DATABRICKS_HOST | This is the `instance name` or `per-workspace URL` of your Azure Databricks service. Its value can be found from the Databricks service page on the Azure Portal under the `URL` parameter. For more information [this]( https://docs.microsoft.com/en-us/azure/databricks/workspace/workspace-details#per-workspace-url) resource can be used. The value will look something like `https://adb-5555555555555555.19.azuredatabricks.net`|
10296
| DATABRICKS_TOKEN | This is the value of the `Access Token` you created in `1.3`. The value should look something like `dapi55555555555555555555555555555555-2`. |
10397
| CONTAINER_REGISTRY_NAME | The name of the ACR service deployed in template two. |
10498
| CONTAINER_REGISTRY_PASSWORD | This can be found in the access keys section of the ACR service page. The Admin Account section of [this]( https://docs.microsoft.com/en-us/azure/container-registry/container-registry-authentication?tabs=azure-cli#admin-account) document contains more information. |
@@ -259,4 +253,4 @@ Lower values for a feature indicate a greater likelihood of drift and values bel
259253

260254
Custom charts can also be developed using this data by selecting the `Chart` tab and changing the values in the `Chart formatting` section.
261255

262-
![4-3](.github/../images/implementation/4-3.png)
256+
![4-3](.github/../images/implementation/4-3.png)

README.md

Lines changed: 5 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,16 @@
22

33
## Overview
44

5+
> For additional insights into applying this approach to operationalize your machine learning workloads refer to this article — [Machine Learning at Scale with Databricks and Kubernetes](https://medium.com/@nfmoore/machine-learning-at-scale-with-databricks-and-kubernetes-9fa59232bfa6)
56
This repository contains resources for an end-to-end proof of concept which illustrates how an MLFlow model can be trained on Databricks, packaged as a web service, deployed to Kubernetes via CI/CD, and monitored within Microsoft Azure. A high-level solution design is shown below:
67

78
![workflow](.github/docs/images/workflow.png)
89

9-
For more information on a generic solution design see the [Architecture Guide](.github/docs/architecture-guide.md)
10+
Within Azure Databricks, the `IBM HR Analytics Employee Attrition & Performance` [dataset](https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset) available from Kaggle will be used to develop and register a machine learning model. This model will predict the likelihood of attrition for an employee along with metrics capturing data drift and outliers to access the model's validity.
1011

11-
> For additional insights into applying this approach to operationalize your machine learning workloads refer to this article — [Machine Learning at Scale with Databricks and Kubernetes](https://medium.com/@nfmoore/machine-learning-at-scale-with-databricks-and-kubernetes-9fa59232bfa6)
12+
This model will then be deployed as an API for real-time inference using Azure Kubernetes Service. This API can be integrated with external applications used by HR teams to provide additional insights into the likelihood of attrition for a given employee within the organization. This information can be used to determine if a high-impact employee is likely to leave the organization and hence provide HR with the ability to proactively incentivize the employee to stay.
13+
14+
The design covered in this proof-of-concept can be generalized to many machine learning workloads. For more information on a generic solution design see the [Architecture Guide](.github/docs/architecture-guide.md).
1215

1316
## Getting Started
1417

@@ -24,16 +27,6 @@ This repository contains detailed step-by-step instructions on how to implement
2427

2528
For detailed step-by-step instructions see the [Implementation Guide](.github/docs/implementation-guide.md).
2629

27-
## Scenario
28-
29-
This proof-of-concept will be based on a common problem in HR analytics - employee attrition. Employee Attrition refers to the process by which employees leave an organization – for example, through resignation for personal reasons or retirement – and are not immediately replaced.
30-
31-
Within this proof-of-concept, a machine learning model will be developed to predict the likelihood of attrition for an employee along with metrics capturing data drift and outliers to access the model's validity. This implementation uses the `IBM HR Analytics Employee Attrition & Performance` [dataset](https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset) available from Kaggle.
32-
33-
The scenario in this repository will first develop a machine learning model which will then be deployed as an API for online inference. This API can be integrated with external applications used by HR teams to provide additional insights into the likelihood of attrition for a given employee within the organization.
34-
35-
The scenario in this repository will first develop a machine learning model which will then be deployed as an API for online inference. This API can be integrated with external applications used by HR teams to provide additional insights into the likelihood of attrition for a given employee within the organization. This information can be used to determine if a high-impact employee is likely to leave the organization and hence provide HR with the ability to proactively incentivize the employee to stay.
36-
3730
## License
3831

3932
Details on licensing for the project can be found in the [LICENSE](./LICENSE) file.

infrastructure/main.bicep

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,32 @@
1+
//********************************************************
2+
// General Parameters
3+
//********************************************************
4+
5+
@description('Resource Location')
6+
param resourceLocation string = resourceGroup().location
7+
8+
@description('Virtual Network IP Address Prefixes')
9+
param vNetIPAddressPrefixesForFirstDeployment array = [
10+
'192.168.0.0/16'
11+
]
12+
13+
@description('AKS Subnet IP Address Prefix')
14+
param subnetAksIpAddressPrefixForFirstDeployment string = '192.168.0.0/24'
15+
16+
@description('App Gateway IP Address Prefix')
17+
param subnetAppGwIpAddressPrefixForFirstDeployment string = '192.168.1.0/24'
18+
19+
@description('Virtual Network IP Address Prefixes')
20+
param vNetIPAddressPrefixesForSecondDeployment array = [
21+
'192.167.0.0/16'
22+
]
23+
24+
@description('AKS Subnet IP Address Prefix')
25+
param subnetAksIpAddressPrefixForSecondDeployment string = '192.167.0.0/24'
26+
27+
@description('App Gateway IP Address Prefix')
28+
param subnetAppGwIpAddressPrefixForSecondDeployment string = '192.167.1.0/24'
29+
130
//********************************************************
231
// Modules
332
//********************************************************
@@ -6,20 +35,29 @@ module m_databricks './modules/databricks.bicep' = {
635
name: 'm_databricks'
736
params: {
837
resourceInstance: '01'
38+
location: resourceLocation
939
}
1040
}
1141

1242
module m_microservices_01 './modules/microservices.bicep' = {
1343
name: 'm_microservices_01'
1444
params: {
1545
resourceInstance: '01'
46+
location: resourceLocation
47+
vNetIPAddressPrefixes: vNetIPAddressPrefixesForFirstDeployment
48+
subnetAksIpAddressPrefix: subnetAksIpAddressPrefixForFirstDeployment
49+
subnetAppGwIpAddressPrefix: subnetAppGwIpAddressPrefixForFirstDeployment
1650
}
1751
}
1852

1953
module m_microservices_02 './modules/microservices.bicep' = {
2054
name: 'm_microservices_02'
2155
params: {
2256
resourceInstance: '02'
57+
location: resourceLocation
58+
vNetIPAddressPrefixes: vNetIPAddressPrefixesForSecondDeployment
59+
subnetAksIpAddressPrefix: subnetAksIpAddressPrefixForSecondDeployment
60+
subnetAppGwIpAddressPrefix: subnetAppGwIpAddressPrefixForSecondDeployment
2361
useExistingContainerRegistry: true
2462
useExistingLogAnalyticsWorkspace: true
2563
containerRegistryName: m_microservices_01.outputs.containerRegistryName

0 commit comments

Comments
 (0)