Skip to content

Commit 6215ff0

Browse files
add AutoML Sweepable API
1 parent fc306b1 commit 6215ff0

File tree

1 file changed

+277
-0
lines changed

1 file changed

+277
-0
lines changed
Lines changed: 277 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,277 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"## AutoML Sweepable API\n",
8+
"\n",
9+
"This Notebook shows how to use `Sweepable` API to fully customize the pipeline or search space in your AutoML task. In this notebook, you will learn\n",
10+
"- how to use `AutoML().CreateSweepableEstimator` to create `SweepableEstimator`.\n",
11+
"- how to create `SweepablePipeline` for multiple trainer candidates.\n",
12+
"- use built-in `SweepableEstimator` to simplify your work."
13+
]
14+
},
15+
{
16+
"cell_type": "markdown",
17+
"metadata": {},
18+
"source": [
19+
"### Install Nuget packages and add using statement"
20+
]
21+
},
22+
{
23+
"cell_type": "code",
24+
"execution_count": null,
25+
"metadata": {
26+
"dotnet_interactive": {
27+
"language": "csharp"
28+
},
29+
"vscode": {
30+
"languageId": "dotnet-interactive.csharp"
31+
}
32+
},
33+
"outputs": [],
34+
"source": [
35+
"// using nightly-build\n",
36+
"#i \"nuget:https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json\"\n",
37+
"#r \"nuget: Plotly.NET.Interactive, 3.0.2\"\n",
38+
"#r \"nuget: Plotly.NET.CSharp, 0.0.1\"\n",
39+
"#r \"nuget: Microsoft.ML.AutoML, 0.20.0-preview.22470.1\"\n",
40+
"#r \"nuget: Microsoft.Data.Analysis, 0.20.0-preview.22470.1\""
41+
]
42+
},
43+
{
44+
"cell_type": "code",
45+
"execution_count": null,
46+
"metadata": {
47+
"dotnet_interactive": {
48+
"language": "csharp"
49+
},
50+
"vscode": {
51+
"languageId": "dotnet-interactive.csharp"
52+
}
53+
},
54+
"outputs": [],
55+
"source": [
56+
"using static Microsoft.DotNet.Interactive.Formatting.PocketViewTags;\n",
57+
"using Microsoft.Data.Analysis;\n",
58+
"using System;\n",
59+
"using System.IO;\n",
60+
"using Microsoft.ML;\n",
61+
"using Microsoft.ML.AutoML;\n",
62+
"using Microsoft.ML.AutoML.CodeGen;\n",
63+
"using Microsoft.ML.Trainers.LightGbm;\n",
64+
"using Microsoft.ML.Data;\n",
65+
"using Plotly.NET;\n",
66+
"using Microsoft.ML.Transforms.TimeSeries;\n",
67+
"using Microsoft.ML.SearchSpace;\n",
68+
"using System.Diagnostics;"
69+
]
70+
},
71+
{
72+
"cell_type": "markdown",
73+
"metadata": {
74+
"dotnet_interactive": {
75+
"language": "csharp"
76+
}
77+
},
78+
"source": [
79+
"#### Use `AutoML().CreateSweepableEstimator` to create `SweepableEstimator`\n",
80+
"\n",
81+
"A `SweepableEstimator` is nothing different than a normal `Estimator` plus `SearchSpace`. The following code shows how to create a sweepable `LightGbm` and `SDCA`.\n",
82+
"\n",
83+
"For simplicity, the built-in search space for `LightGbm` and `SDCA` is used but you can fully customize the search space however way you want. For more details on how to do that, please check [Parameter And SearchSpace](./Parameter%20and%20SearchSpace.ipynb)"
84+
]
85+
},
86+
{
87+
"cell_type": "code",
88+
"execution_count": null,
89+
"metadata": {
90+
"dotnet_interactive": {
91+
"language": "csharp"
92+
},
93+
"vscode": {
94+
"languageId": "dotnet-interactive.csharp"
95+
}
96+
},
97+
"outputs": [],
98+
"source": [
99+
"var context = new MLContext();\n",
100+
"var lgbmSearchSpace = new SearchSpace<LgbmOption>();\n",
101+
"var sweepableLgbm = context.Auto().CreateSweepableEstimator((context, param) => {\n",
102+
" var option = new LightGbmRegressionTrainer.Options()\n",
103+
" {\n",
104+
" NumberOfLeaves = param.NumberOfLeaves,\n",
105+
" NumberOfIterations = param.NumberOfTrees,\n",
106+
" MinimumExampleCountPerLeaf = param.MinimumExampleCountPerLeaf,\n",
107+
" LearningRate = param.LearningRate,\n",
108+
" LabelColumnName = \"Label\",\n",
109+
" FeatureColumnName = \"Features\",\n",
110+
" Booster = new GradientBooster.Options()\n",
111+
" {\n",
112+
" SubsampleFraction = param.SubsampleFraction,\n",
113+
" FeatureFraction = param.FeatureFraction,\n",
114+
" L1Regularization = param.L1Regularization,\n",
115+
" L2Regularization = param.L2Regularization,\n",
116+
" },\n",
117+
" MaximumBinCountPerFeature = param.MaximumBinCountPerFeature,\n",
118+
" };\n",
119+
"\n",
120+
" return context.Regression.Trainers.LightGbm(option);\n",
121+
"}, lgbmSearchSpace);\n",
122+
"\n",
123+
"var sdcaSearchSpace = new SearchSpace<SdcaOption>();\n",
124+
"var sweepableSdca = context.Auto().CreateSweepableEstimator((context, param) => {\n",
125+
" return context.Regression.Trainers.Sdca(\"Label\", \"Features\", l1Regularization: param.L1Regularization, l2Regularization: param.L2Regularization);\n",
126+
"}, sdcaSearchSpace);"
127+
]
128+
},
129+
{
130+
"cell_type": "markdown",
131+
"metadata": {},
132+
"source": [
133+
"#### Create `SweepablePipeline` with multiple trainer candidates.\n",
134+
"\n",
135+
"`SweepablePipeline` allows you to put multiple estimators as candidates to a certain stage. During AutoML sweeping, these candidates will be evaluated seperatly and the one with best metric will be picked. Note that the estimator doesn't necessarily need to be a trainer, it can be a trainer, transformer or even a `SweepablePipeline`, as long as they all have the same input and output schema.\n",
136+
"\n",
137+
"The following code shows how to create a `SweepablePipeline` with `sweepableSdca` and `sweepableLgbm` we created above."
138+
]
139+
},
140+
{
141+
"cell_type": "code",
142+
"execution_count": null,
143+
"metadata": {
144+
"dotnet_interactive": {
145+
"language": "csharp"
146+
},
147+
"vscode": {
148+
"languageId": "dotnet-interactive.csharp"
149+
}
150+
},
151+
"outputs": [],
152+
"source": [
153+
"var sweepablePipeline = context.Transforms.Concatenate(\"Features\", \"X1\", \"X2\")\n",
154+
" .Append(sweepableSdca, sweepableLgbm);"
155+
]
156+
},
157+
{
158+
"cell_type": "markdown",
159+
"metadata": {},
160+
"source": [
161+
"#### Config `AutoMLExperiment` using `sweepablePipeline`\n",
162+
"In the next step, we are going to train `sweepablePipeline` on a generated non-linear dataset using `AutoMLExperiment`, which will sweeping both `sdca` and `lightGbm` on configured search space. Considering that `sdca` is a linear classifier, the winning model should be `lightGbm`."
163+
]
164+
},
165+
{
166+
"cell_type": "code",
167+
"execution_count": null,
168+
"metadata": {
169+
"dotnet_interactive": {
170+
"language": "csharp"
171+
},
172+
"vscode": {
173+
"languageId": "dotnet-interactive.csharp"
174+
}
175+
},
176+
"outputs": [],
177+
"source": [
178+
"var rand = new Random(0);\n",
179+
"var context =new MLContext(seed: 1);\n",
180+
"var x1 = Enumerable.Range(0, 1000).Select(_x => rand.NextSingle() * 100).ToArray();\n",
181+
"var x2 = x1.Select(_x => rand.NextSingle() * 100).ToArray();\n",
182+
"var y = Enumerable.Zip(x1, x2).Select(_x => _x.Second * _x.First + (rand.NextSingle() - 0.5f) * 10).ToArray();\n",
183+
"var df = new DataFrame();\n",
184+
"df[\"X1\"] = DataFrameColumn.Create(\"X1\", x1);\n",
185+
"df[\"X2\"] = DataFrameColumn.Create(\"X2\", x2);\n",
186+
"df[\"Label\"] = DataFrameColumn.Create(\"Label\", y);\n",
187+
"var trainTestSplit = context.Data.TrainTestSplit(df);\n",
188+
"df.Head(10)"
189+
]
190+
},
191+
{
192+
"cell_type": "code",
193+
"execution_count": null,
194+
"metadata": {
195+
"dotnet_interactive": {
196+
"language": "csharp"
197+
},
198+
"vscode": {
199+
"languageId": "dotnet-interactive.csharp"
200+
}
201+
},
202+
"outputs": [],
203+
"source": [
204+
"var monitor = new NotebookMonitor(sweepablePipeline);\n",
205+
"var experiment = context.Auto().CreateExperiment();\n",
206+
"experiment.SetDataset(df, 5)\n",
207+
" .SetPipeline(sweepablePipeline)\n",
208+
" .SetTrainingTimeInSeconds(50)\n",
209+
" .SetRegressionMetric(RegressionMetric.RootMeanSquaredError)\n",
210+
" .SetMonitor(monitor);\n",
211+
"\n",
212+
"// Configure Visualizer\t\t\t\n",
213+
"monitor.SetUpdate(monitor.Display());\n",
214+
"\n",
215+
"var res = await experiment.RunAsync();\n",
216+
"\n",
217+
"// check the type of last trainer, which should be lightGbm\n",
218+
"(res.Model as TransformerChain<ITransformer>).Last().GetType()"
219+
]
220+
},
221+
{
222+
"cell_type": "markdown",
223+
"metadata": {},
224+
"source": [
225+
"#### Use built-in sweepable estimators\n",
226+
"\n",
227+
"`AutoML` provides built-in sweepable estimator candidates for binary-classification, multi-class classification and regression. For those scenarios, you can simply use those candidates instead of creating `SweepableEstimator` from scratch."
228+
]
229+
},
230+
{
231+
"cell_type": "code",
232+
"execution_count": null,
233+
"metadata": {
234+
"dotnet_interactive": {
235+
"language": "csharp"
236+
},
237+
"vscode": {
238+
"languageId": "dotnet-interactive.csharp"
239+
}
240+
},
241+
"outputs": [],
242+
"source": [
243+
"var regressionTrainerCandidates = context.Auto().Regression();\n",
244+
"var binaryClassificationTrainerCandidates = context.Auto().BinaryClassification();\n",
245+
"var multiclassClassificationTrainerCandidates = context.Auto().MultiClassification();"
246+
]
247+
},
248+
{
249+
"cell_type": "markdown",
250+
"metadata": {},
251+
"source": [
252+
"#### See also\n",
253+
"- [Training and AutoML](./03-Training%20and%20AutoML.ipynb)\n",
254+
"- [Regression with Taxi Dataset](./E2E-Regression%20with%20Taxi%20Dataset.ipynb)\n",
255+
"- [Classification with Iris Dataset](./E2E-Classification%20with%20Iris%20Dataset.ipynb)\n",
256+
"- [Kaggle with Titanic Dataset](./REF-Kaggle%20with%20Titanic%20Dataset.ipynb)"
257+
]
258+
}
259+
],
260+
"metadata": {
261+
"kernelspec": {
262+
"display_name": ".NET (C#)",
263+
"language": "C#",
264+
"name": ".net-csharp"
265+
},
266+
"language_info": {
267+
"file_extension": ".cs",
268+
"mimetype": "text/x-csharp",
269+
"name": "C#",
270+
"pygments_lexer": "csharp",
271+
"version": "9.0"
272+
},
273+
"orig_nbformat": 4
274+
},
275+
"nbformat": 4,
276+
"nbformat_minor": 2
277+
}

0 commit comments

Comments
 (0)