Skip to content

Commit d589aa4

Browse files
Merge pull request #69 from LittleLittleCloud/u/xiaoyun/sweepable
add AutoML Sweepable API notebook
2 parents ea8cc36 + 4b22fe3 commit d589aa4

File tree

1 file changed

+277
-0
lines changed

1 file changed

+277
-0
lines changed
Lines changed: 277 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,277 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"## AutoML Sweepable API\n",
8+
"\n",
9+
"This Notebook shows how to use `Sweepable` API to fully customize the pipeline or search space in your AutoML task. In this notebook, you will learn\n",
10+
"- use built-in `SweepableEstimator` to simplify your work.\n",
11+
"- how to use `AutoML().CreateSweepableEstimator` to create `SweepableEstimator`.\n",
12+
"- how to create `SweepablePipeline` for multiple trainer candidates.\n"
13+
]
14+
},
15+
{
16+
"cell_type": "markdown",
17+
"metadata": {},
18+
"source": [
19+
"### Install Nuget packages and add using statement"
20+
]
21+
},
22+
{
23+
"cell_type": "code",
24+
"execution_count": null,
25+
"metadata": {
26+
"dotnet_interactive": {
27+
"language": "csharp"
28+
},
29+
"vscode": {
30+
"languageId": "dotnet-interactive.csharp"
31+
}
32+
},
33+
"outputs": [],
34+
"source": [
35+
"// using nightly-build\n",
36+
"#i \"nuget:https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json\"\n",
37+
"#r \"nuget: Plotly.NET.Interactive, 3.0.2\"\n",
38+
"#r \"nuget: Plotly.NET.CSharp, 0.0.1\"\n",
39+
"#r \"nuget: Microsoft.ML.AutoML, 0.20.0-preview.22470.1\"\n",
40+
"#r \"nuget: Microsoft.Data.Analysis, 0.20.0-preview.22470.1\""
41+
]
42+
},
43+
{
44+
"cell_type": "code",
45+
"execution_count": null,
46+
"metadata": {
47+
"dotnet_interactive": {
48+
"language": "csharp"
49+
},
50+
"vscode": {
51+
"languageId": "dotnet-interactive.csharp"
52+
}
53+
},
54+
"outputs": [],
55+
"source": [
56+
"using static Microsoft.DotNet.Interactive.Formatting.PocketViewTags;\n",
57+
"using Microsoft.Data.Analysis;\n",
58+
"using System;\n",
59+
"using System.IO;\n",
60+
"using Microsoft.ML;\n",
61+
"using Microsoft.ML.AutoML;\n",
62+
"using Microsoft.ML.AutoML.CodeGen;\n",
63+
"using Microsoft.ML.Trainers.LightGbm;\n",
64+
"using Microsoft.ML.Data;\n",
65+
"using Plotly.NET;\n",
66+
"using Microsoft.ML.Transforms.TimeSeries;\n",
67+
"using Microsoft.ML.SearchSpace;\n",
68+
"using System.Diagnostics;"
69+
]
70+
},
71+
{
72+
"cell_type": "markdown",
73+
"metadata": {},
74+
"source": [
75+
"#### Use built-in sweepable estimators\n",
76+
"\n",
77+
"`AutoML` provides built-in sweepable estimator candidates for binary-classification, multi-class classification and regression. For those scenarios, you can simply use those candidates instead of creating `SweepableEstimator` from scratch."
78+
]
79+
},
80+
{
81+
"cell_type": "code",
82+
"execution_count": null,
83+
"metadata": {
84+
"dotnet_interactive": {
85+
"language": "csharp"
86+
},
87+
"vscode": {
88+
"languageId": "dotnet-interactive.csharp"
89+
}
90+
},
91+
"outputs": [],
92+
"source": [
93+
"var regressionTrainerCandidates = context.Auto().Regression();\n",
94+
"var binaryClassificationTrainerCandidates = context.Auto().BinaryClassification();\n",
95+
"var multiclassClassificationTrainerCandidates = context.Auto().MultiClassification();"
96+
]
97+
},
98+
{
99+
"cell_type": "markdown",
100+
"metadata": {
101+
"dotnet_interactive": {
102+
"language": "csharp"
103+
}
104+
},
105+
"source": [
106+
"#### Use `AutoML().CreateSweepableEstimator` to create `SweepableEstimator`\n",
107+
"\n",
108+
"In case the built-in `SweepableEstimator` doesn't satisfy your requirement, you can call `CreateSweepableEstimator` to create a customized `SweepableEstimator`. A `SweepableEstimator` is nothing different than a normal `Estimator` plus `SearchSpace`. The following code shows how to create a sweepable `LightGbm` and `SDCA`.\n",
109+
"\n",
110+
"For simplicity, the built-in search space for `LightGbm` and `SDCA` is used but you can fully customize the search space however way you want. For more details on how to do that, please check [Parameter And SearchSpace](./Parameter%20and%20SearchSpace.ipynb)"
111+
]
112+
},
113+
{
114+
"cell_type": "code",
115+
"execution_count": null,
116+
"metadata": {
117+
"dotnet_interactive": {
118+
"language": "csharp"
119+
},
120+
"vscode": {
121+
"languageId": "dotnet-interactive.csharp"
122+
}
123+
},
124+
"outputs": [],
125+
"source": [
126+
"var context = new MLContext();\n",
127+
"var lgbmSearchSpace = new SearchSpace<LgbmOption>();\n",
128+
"var sweepableLgbm = context.Auto().CreateSweepableEstimator((context, param) => {\n",
129+
" var option = new LightGbmRegressionTrainer.Options()\n",
130+
" {\n",
131+
" NumberOfLeaves = param.NumberOfLeaves,\n",
132+
" NumberOfIterations = param.NumberOfTrees,\n",
133+
" MinimumExampleCountPerLeaf = param.MinimumExampleCountPerLeaf,\n",
134+
" LearningRate = param.LearningRate,\n",
135+
" LabelColumnName = \"Label\",\n",
136+
" FeatureColumnName = \"Features\",\n",
137+
" Booster = new GradientBooster.Options()\n",
138+
" {\n",
139+
" SubsampleFraction = param.SubsampleFraction,\n",
140+
" FeatureFraction = param.FeatureFraction,\n",
141+
" L1Regularization = param.L1Regularization,\n",
142+
" L2Regularization = param.L2Regularization,\n",
143+
" },\n",
144+
" MaximumBinCountPerFeature = param.MaximumBinCountPerFeature,\n",
145+
" };\n",
146+
"\n",
147+
" return context.Regression.Trainers.LightGbm(option);\n",
148+
"}, lgbmSearchSpace);\n",
149+
"\n",
150+
"var sdcaSearchSpace = new SearchSpace<SdcaOption>();\n",
151+
"var sweepableSdca = context.Auto().CreateSweepableEstimator((context, param) => {\n",
152+
" return context.Regression.Trainers.Sdca(\"Label\", \"Features\", l1Regularization: param.L1Regularization, l2Regularization: param.L2Regularization);\n",
153+
"}, sdcaSearchSpace);"
154+
]
155+
},
156+
{
157+
"cell_type": "markdown",
158+
"metadata": {},
159+
"source": [
160+
"#### Create `SweepablePipeline` with multiple trainer candidates.\n",
161+
"\n",
162+
"`SweepablePipeline` allows you to put multiple estimators as candidates to a certain stage. During AutoML sweeping, these candidates will be evaluated seperatly and the one with best metric will be picked. Note that the estimator doesn't necessarily need to be a trainer, it can be a trainer, transformer or even a `SweepablePipeline`, as long as they all have the same input and output schema.\n",
163+
"\n",
164+
"The following code shows how to create a `SweepablePipeline` with `sweepableSdca` and `sweepableLgbm` we created above."
165+
]
166+
},
167+
{
168+
"cell_type": "code",
169+
"execution_count": null,
170+
"metadata": {
171+
"dotnet_interactive": {
172+
"language": "csharp"
173+
},
174+
"vscode": {
175+
"languageId": "dotnet-interactive.csharp"
176+
}
177+
},
178+
"outputs": [],
179+
"source": [
180+
"var sweepablePipeline = context.Transforms.Concatenate(\"Features\", \"X1\", \"X2\")\n",
181+
" .Append(sweepableSdca, sweepableLgbm);"
182+
]
183+
},
184+
{
185+
"cell_type": "markdown",
186+
"metadata": {},
187+
"source": [
188+
"#### Config `AutoMLExperiment` using `sweepablePipeline`\n",
189+
"In the next step, we are going to train `sweepablePipeline` on a generated non-linear dataset using `AutoMLExperiment`, which will sweeping both `sdca` and `lightGbm` on configured search space. Considering that `sdca` is a linear classifier, the winning model should be `lightGbm`."
190+
]
191+
},
192+
{
193+
"cell_type": "code",
194+
"execution_count": null,
195+
"metadata": {
196+
"dotnet_interactive": {
197+
"language": "csharp"
198+
},
199+
"vscode": {
200+
"languageId": "dotnet-interactive.csharp"
201+
}
202+
},
203+
"outputs": [],
204+
"source": [
205+
"var rand = new Random(0);\n",
206+
"var context =new MLContext(seed: 1);\n",
207+
"var x1 = Enumerable.Range(0, 1000).Select(_x => rand.NextSingle() * 100).ToArray();\n",
208+
"var x2 = x1.Select(_x => rand.NextSingle() * 100).ToArray();\n",
209+
"var y = Enumerable.Zip(x1, x2).Select(_x => _x.Second * _x.First + (rand.NextSingle() - 0.5f) * 10).ToArray();\n",
210+
"var df = new DataFrame();\n",
211+
"df[\"X1\"] = DataFrameColumn.Create(\"X1\", x1);\n",
212+
"df[\"X2\"] = DataFrameColumn.Create(\"X2\", x2);\n",
213+
"df[\"Label\"] = DataFrameColumn.Create(\"Label\", y);\n",
214+
"var trainTestSplit = context.Data.TrainTestSplit(df);\n",
215+
"df.Head(10)"
216+
]
217+
},
218+
{
219+
"cell_type": "code",
220+
"execution_count": null,
221+
"metadata": {
222+
"dotnet_interactive": {
223+
"language": "csharp"
224+
},
225+
"vscode": {
226+
"languageId": "dotnet-interactive.csharp"
227+
}
228+
},
229+
"outputs": [],
230+
"source": [
231+
"var monitor = new NotebookMonitor(sweepablePipeline);\n",
232+
"var experiment = context.Auto().CreateExperiment();\n",
233+
"experiment.SetDataset(df, 5)\n",
234+
" .SetPipeline(sweepablePipeline)\n",
235+
" .SetTrainingTimeInSeconds(50)\n",
236+
" .SetRegressionMetric(RegressionMetric.RootMeanSquaredError)\n",
237+
" .SetMonitor(monitor);\n",
238+
"\n",
239+
"// Configure Visualizer\t\t\t\n",
240+
"monitor.SetUpdate(monitor.Display());\n",
241+
"\n",
242+
"var res = await experiment.RunAsync();\n",
243+
"\n",
244+
"// check the type of last trainer for winning model, which should be lightGbm\n",
245+
"(res.Model as TransformerChain<ITransformer>).Last().GetType()"
246+
]
247+
},
248+
{
249+
"cell_type": "markdown",
250+
"metadata": {},
251+
"source": [
252+
"#### See also\n",
253+
"- [Training and AutoML](./03-Training%20and%20AutoML.ipynb)\n",
254+
"- [Regression with Taxi Dataset](./E2E-Regression%20with%20Taxi%20Dataset.ipynb)\n",
255+
"- [Classification with Iris Dataset](./E2E-Classification%20with%20Iris%20Dataset.ipynb)\n",
256+
"- [Kaggle with Titanic Dataset](./REF-Kaggle%20with%20Titanic%20Dataset.ipynb)"
257+
]
258+
}
259+
],
260+
"metadata": {
261+
"kernelspec": {
262+
"display_name": ".NET (C#)",
263+
"language": "C#",
264+
"name": ".net-csharp"
265+
},
266+
"language_info": {
267+
"file_extension": ".cs",
268+
"mimetype": "text/x-csharp",
269+
"name": "C#",
270+
"pygments_lexer": "csharp",
271+
"version": "9.0"
272+
},
273+
"orig_nbformat": 4
274+
},
275+
"nbformat": 4,
276+
"nbformat_minor": 2
277+
}

0 commit comments

Comments
 (0)