You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .github/docs/architecture-guide.md
+4-11Lines changed: 4 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,7 +38,9 @@ The following components are used as part of this design:
38
38
-[Azure Container Registry](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-intro): managed and private Docker registry service based on the open-source Docker.
39
39
-[Azure Data Lake Gen 2](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction): scalable solution optimized for storing massive amounts of unstructured data.
40
40
-[Azure Monitor](https://docs.microsoft.com/en-us/azure/azure-monitor/overview): a comprehensive solution for collecting, analyzing, and acting on telemetry from your workloads.
41
-
-[MLFlow](https://docs.microsoft.com/en-us/azure/databricks/applications/mlflow): open-source solution integrated within Databricks for managing the end-to-end machine learning life cycle.
41
+
-[MLflow](https://docs.microsoft.com/en-us/azure/databricks/applications/mlflow): open-source solution integrated within Databricks for managing the end-to-end machine learning life cycle.
42
+
-[Azure API Management](https://docs.microsoft.com/en-us/azure/api-management/api-management-key-concepts): a fully managed service that enables customers to publish, secure, transform, maintain, and monitor APIs.
43
+
-[Azure Application Gateway](https://docs.microsoft.com/en-us/azure/application-gateway/overview): a web traffic load balancer that enables you to manage traffic to your web applications.
42
44
-[Azure DevOps](https://azure.microsoft.com/solutions/devops/) or [GitHub](https://azure.microsoft.com/products/github/): solutions for implementing DevOps practices to enforce automation and compliance with your workload development and deployment pipelines.
43
45
44
46
> **NOTE:**
@@ -61,20 +63,11 @@ Before implementing this solution some factors you might want to consider, inclu
61
63
62
64
All services deployed in this solution use a consumption-based pricing model. The [Azure pricing calculator](https://azure.microsoft.com/pricing/calculator) can be used to estimate costs for a specific scenario. For other considerations, see [Cost Optimization](https://docs.microsoft.com/en-us/azure/architecture/framework/#cost-optimization) in the Well-Architected Framework.
63
65
64
-
## Deploy this scenario
65
-
66
-
A proof-of-concept implementation of this scenario is available at the [MLOps Platform using Databricks and Kubernetes](https://github.com/nfmoore/databricks-kubernetes-mlops-poc) repository. This sample illustrates:
67
-
68
-
- How an MLFlow model can be trained on Databricks.
69
-
- How to package models as a web service using open-source tools.
70
-
- How to deploy to Kubernetes via CI/CD.
71
-
- How to monitor API performance and model data drift.
72
-
73
66
## Related resources
74
67
75
68
You may also find these Architecture Center articles useful:
-[Team Data Science Process for data scientists](https://docs.microsoft.com/en-us/azure/architecture/data-science-process/overview)
79
72
-[Modern analytics architecture with Azure Databricks](https://docs.microsoft.com/en-us/azure/architecture/solution-ideas/articles/azure-databricks-modern-analytics-architecture)
80
-
-[Building A Clinical Data Drift Monitoring System With Azure DevOps, Azure Databricks, And MLflow](https://devblogs.microsoft.com/cse/2020/10/29/building-a-clinical-data-drift-monitoring-system-with-azure-devops-azure-databricks-and-mlflow/)
73
+
-[Building A Clinical Data Drift Monitoring System With Azure DevOps, Azure Databricks, And MLflow](https://devblogs.microsoft.com/cse/2020/10/29/building-a-clinical-data-drift-monitoring-system-with-azure-devops-azure-databricks-and-mlflow/)
Copy file name to clipboardExpand all lines: .github/docs/implementation-guide.md
+4-10Lines changed: 4 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@
9
9
10
10
### 1.1. Create repository
11
11
12
-
Log in to your GitHub account, navigate to the [databricks-kubernetes-real-time-mlflow-model-deployment-poc](https://github.com/nfmoore/databricks-kubernetes-real-time-mlflow-model-deployment-poc) repository and click `use this template` to create a new repository from this template. Rename the template and leave it public. Use [these](https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-from-a-template) instructions for more details about creating a repository from a template.
12
+
Log in to your GitHub account, navigate to the [databricks-kubernetes-mlops-poc](https://github.com/nfmoore/databricks-kubernetes-mlops-poc) repository and click `use this template` to create a new repository from this template. Rename the template and leave it public. Use [these](https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-from-a-template) instructions for more details about creating a repository from a template.
13
13
14
14
### 1.2. Deploy resources
15
15
@@ -25,12 +25,6 @@ To deploy the resources for this proof-of-concept in your Azure environment clic
25
25
26
26
After the resources have been successfully deployed some services need to be configured before you can train, register, deploy and monitor the machine learning models.
27
27
28
-
#### Log Analytics Workspace
29
-
30
-
For the Log Analytics workspace, Azure Monitor for Containers needs to be enabled. To enable this, click on an AKS cluster deployed as part of 1.2 above, click on the Logs tab in the monitoring section, then select your Log Analytics workspace and click enable. This process is shown in the image below. Ensure to repeat this process for the second AKS cluster in your resource group.
31
-
32
-

33
-
34
28
#### Azure Databricks
35
29
36
30
For Azure Databricks you need to enable the [Files in Repo](https://docs.microsoft.com/en-us/azure/databricks/repos#enable-support-for-arbitrary-files-in-databricks-repos) feature (which is not enabled by default at the time of developing this proof-of-concept), generate a new [Databricks Access Token](https://docs.microsoft.com/en-au/azure/databricks/dev-tools/api/latest/authentication), and create a [cluster with custom libraries](https://docs.microsoft.com/en-au/azure/databricks/libraries/cluster-libraries).
@@ -97,8 +91,8 @@ You need to create the following secrets:
97
91
98
92
| Secret name | How to find secret value |
99
93
|:------------|:-------------------------|
100
-
| AZURE_CREDENTIALS | A JSON object with details of your Azure Service Principal. [This](https://github.com/marketplace/actions/azure-login#configure-deployment-credentials) document will help you configure a service principal with a secret. The value will look something like: `{ "clientId": "<GUID>", "clientSecret": "<GUID>", "subscriptionId": "<GUID>", "tenantId": "<GUID>", ... }`|
101
-
| DATABRICKS_HOST | This is the `instance name` or `per-workspace URL` of your Azure Databricks service. Its value can be found from the Databricks service page on the Azure Portal under the `URL` parameter. For more information [this](https://docs.microsoft.com/en-us/azure/databricks/workspace/workspace-details#per-workspace-url) resource can be used. The value will look something like `https://adb-5555555555555555.19.azuredatabricks.net`|
94
+
| AZURE_CREDENTIALS | A JSON object with details of your Azure Service Principal. [This](https://github.com/marketplace/actions/azure-login#configure-deployment-credentials) document will help you configure a service principal with a secret. The value will look something like: `{ "clientId": "<GUID>", "clientSecret": "<GUID>", "subscriptionId": "<GUID>", "tenantId": "<GUID>", ... }`|
95
+
| DATABRICKS_HOST | This is the `instance name` or `per-workspace URL` of your Azure Databricks service. Its value can be found from the Databricks service page on the Azure Portal under the `URL` parameter. For more information [this](https://docs.microsoft.com/en-us/azure/databricks/workspace/workspace-details#per-workspace-url) resource can be used. The value will look something like `https://adb-5555555555555555.19.azuredatabricks.net`|
102
96
| DATABRICKS_TOKEN | This is the value of the `Access Token` you created in `1.3`. The value should look something like `dapi55555555555555555555555555555555-2`. |
103
97
| CONTAINER_REGISTRY_NAME | The name of the ACR service deployed in template two. |
104
98
| CONTAINER_REGISTRY_PASSWORD | This can be found in the access keys section of the ACR service page. The Admin Account section of [this](https://docs.microsoft.com/en-us/azure/container-registry/container-registry-authentication?tabs=azure-cli#admin-account) document contains more information. |
@@ -259,4 +253,4 @@ Lower values for a feature indicate a greater likelihood of drift and values bel
259
253
260
254
Custom charts can also be developed using this data by selecting the `Chart` tab and changing the values in the `Chart formatting` section.
Copy file name to clipboardExpand all lines: README.md
+5-12Lines changed: 5 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,13 +2,16 @@
2
2
3
3
## Overview
4
4
5
+
> For additional insights into applying this approach to operationalize your machine learning workloads refer to this article — [Machine Learning at Scale with Databricks and Kubernetes](https://medium.com/@nfmoore/machine-learning-at-scale-with-databricks-and-kubernetes-9fa59232bfa6)
5
6
This repository contains resources for an end-to-end proof of concept which illustrates how an MLFlow model can be trained on Databricks, packaged as a web service, deployed to Kubernetes via CI/CD, and monitored within Microsoft Azure. A high-level solution design is shown below:
6
7
7
8

8
9
9
-
For more information on a generic solution design see the [Architecture Guide](.github/docs/architecture-guide.md)
10
+
Within Azure Databricks, the `IBM HR Analytics Employee Attrition & Performance`[dataset](https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset) available from Kaggle will be used to develop and register a machine learning model. This model will predict the likelihood of attrition for an employee along with metrics capturing data drift and outliers to access the model's validity.
10
11
11
-
> For additional insights into applying this approach to operationalize your machine learning workloads refer to this article — [Machine Learning at Scale with Databricks and Kubernetes](https://medium.com/@nfmoore/machine-learning-at-scale-with-databricks-and-kubernetes-9fa59232bfa6)
12
+
This model will then be deployed as an API for real-time inference using Azure Kubernetes Service. This API can be integrated with external applications used by HR teams to provide additional insights into the likelihood of attrition for a given employee within the organization. This information can be used to determine if a high-impact employee is likely to leave the organization and hence provide HR with the ability to proactively incentivize the employee to stay.
13
+
14
+
The design covered in this proof-of-concept can be generalized to many machine learning workloads. For more information on a generic solution design see the [Architecture Guide](.github/docs/architecture-guide.md).
12
15
13
16
## Getting Started
14
17
@@ -24,16 +27,6 @@ This repository contains detailed step-by-step instructions on how to implement
24
27
25
28
For detailed step-by-step instructions see the [Implementation Guide](.github/docs/implementation-guide.md).
26
29
27
-
## Scenario
28
-
29
-
This proof-of-concept will be based on a common problem in HR analytics - employee attrition. Employee Attrition refers to the process by which employees leave an organization – for example, through resignation for personal reasons or retirement – and are not immediately replaced.
30
-
31
-
Within this proof-of-concept, a machine learning model will be developed to predict the likelihood of attrition for an employee along with metrics capturing data drift and outliers to access the model's validity. This implementation uses the `IBM HR Analytics Employee Attrition & Performance`[dataset](https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset) available from Kaggle.
32
-
33
-
The scenario in this repository will first develop a machine learning model which will then be deployed as an API for online inference. This API can be integrated with external applications used by HR teams to provide additional insights into the likelihood of attrition for a given employee within the organization.
34
-
35
-
The scenario in this repository will first develop a machine learning model which will then be deployed as an API for online inference. This API can be integrated with external applications used by HR teams to provide additional insights into the likelihood of attrition for a given employee within the organization. This information can be used to determine if a high-impact employee is likely to leave the organization and hence provide HR with the ability to proactively incentivize the employee to stay.
36
-
37
30
## License
38
31
39
32
Details on licensing for the project can be found in the [LICENSE](./LICENSE) file.
0 commit comments