|
8 | 8 | } |
9 | 9 | }, |
10 | 10 | "source": [ |
11 | | - "You can create your own search space in AutoML.Net. There're two ways to define a search space: via scratch and via attribution. Via scratch gives you more flexibility while via attribution is more readable. And both ways are equivalant.\n", |
| 11 | + "In AutoML.Net, `SearchSpace` defines the hyper-parameter searching range of a sweepable pipeline or a sweepable estimator and `Parameter` is the sampled result from a `SearchSpace`, which can be used to restore an estimator or a pipeline.\n", |
| 12 | + "\n", |
| 13 | + "`SearchSpace` has a great impact on automl performance. In theory, it decides the upper-bound of an automl model. A larger `SearchSpace` usually means an increasing potential of finding a better model. In practice though, there's also a trade-off and a larger `SearchSpace` doesn't always mean better considering the searching complexity and increasing training cost.\n", |
| 14 | + "\n", |
| 15 | + "AutoML.Net provides a default `SearchSpace` for almost all available ML.Net trainers. However, it also allows you to provide your own search space if the default one doesn't meet your request.\n", |
12 | 16 | "\n", |
13 | 17 | "In this notebook, we will go through a series of topics on search space\n", |
14 | | - "- available options in search space\n", |
15 | | - "- create search space from scratch && it's equivalant way via attribution api.\n", |
16 | | - "- default search space for mlnet trainers." |
| 18 | + "- default search space for mlnet trainers.\n", |
| 19 | + "- how to customize search space" |
| 20 | + ] |
| 21 | + }, |
| 22 | + { |
| 23 | + "cell_type": "markdown", |
| 24 | + "metadata": {}, |
| 25 | + "source": [ |
| 26 | + "# Install nuget dependency" |
17 | 27 | ] |
18 | 28 | }, |
19 | 29 | { |
|
40 | 50 | { |
41 | 51 | "data": { |
42 | 52 | "text/markdown": [ |
43 | | - "Loading extensions from `Microsoft.Data.Analysis.Interactive.dll`" |
| 53 | + "Loading extensions from `Microsoft.ML.AutoML.Interactive.dll`" |
44 | 54 | ] |
45 | 55 | }, |
46 | 56 | "metadata": {}, |
|
49 | 59 | { |
50 | 60 | "data": { |
51 | 61 | "text/markdown": [ |
52 | | - "Loading extensions from `Plotly.NET.Interactive.dll`" |
| 62 | + "Loading extensions from `Microsoft.Data.Analysis.Interactive.dll`" |
53 | 63 | ] |
54 | 64 | }, |
55 | 65 | "metadata": {}, |
|
58 | 68 | { |
59 | 69 | "data": { |
60 | 70 | "text/markdown": [ |
61 | | - "Loading extensions from `Microsoft.ML.AutoML.Interactive.dll`" |
| 71 | + "Loading extensions from `Plotly.NET.Interactive.dll`" |
62 | 72 | ] |
63 | 73 | }, |
64 | 74 | "metadata": {}, |
|
97 | 107 | "using Microsoft.ML.AutoML;\n", |
98 | 108 | "using Microsoft.ML.Data;\n", |
99 | 109 | "using Microsoft.ML.SearchSpace;\n", |
100 | | - "using Newtonsoft.Json;" |
| 110 | + "using Newtonsoft.Json;\n", |
| 111 | + "using Microsoft.ML.SearchSpace.Option;" |
101 | 112 | ] |
102 | 113 | }, |
103 | 114 | { |
104 | 115 | "cell_type": "markdown", |
| 116 | + "metadata": {}, |
| 117 | + "source": [ |
| 118 | + "# Default search space for ml.net trainer\n", |
| 119 | + "AutoML.Net comes with a series of default search space for most of ml.net trainers. You can check it under `Microsoft.ML.AutoML.CodeGen` namespace. The following code shows the default search space for LightGbm.\n", |
| 120 | + "\n", |
| 121 | + "The default search space will be used in `AutoFeaturizer` and `AutoTrainer` api, which provides an easy way to create sweepable estimators." |
| 122 | + ] |
| 123 | + }, |
| 124 | + { |
| 125 | + "cell_type": "code", |
| 126 | + "execution_count": null, |
105 | 127 | "metadata": { |
106 | 128 | "dotnet_interactive": { |
107 | 129 | "language": "csharp" |
| 130 | + }, |
| 131 | + "vscode": { |
| 132 | + "languageId": "dotnet-interactive.csharp" |
108 | 133 | } |
109 | 134 | }, |
| 135 | + "outputs": [ |
| 136 | + { |
| 137 | + "data": { |
| 138 | + "text/plain": [ |
| 139 | + "{\r\n", |
| 140 | + " \"NumberOfLeaves\": {\r\n", |
| 141 | + " \"Min\": 4.0,\r\n", |
| 142 | + " \"Max\": 1024.0,\r\n", |
| 143 | + " \"LogBase\": true,\r\n", |
| 144 | + " \"FeatureSpaceDim\": 1,\r\n", |
| 145 | + " \"Step\": [\r\n", |
| 146 | + " null\r\n", |
| 147 | + " ],\r\n", |
| 148 | + " \"Default\": [\r\n", |
| 149 | + " 0.0\r\n", |
| 150 | + " ]\r\n", |
| 151 | + " },\r\n", |
| 152 | + " \"MinimumExampleCountPerLeaf\": {\r\n", |
| 153 | + " \"Min\": 20.0,\r\n", |
| 154 | + " \"Max\": 1024.0,\r\n", |
| 155 | + " \"LogBase\": true,\r\n", |
| 156 | + " \"FeatureSpaceDim\": 1,\r\n", |
| 157 | + " \"Step\": [\r\n", |
| 158 | + " null\r\n", |
| 159 | + " ],\r\n", |
| 160 | + " \"Default\": [\r\n", |
| 161 | + " 0.0\r\n", |
| 162 | + " ]\r\n", |
| 163 | + " },\r\n", |
| 164 | + " \"LearningRate\": {\r\n", |
| 165 | + " \"Min\": 2E-10,\r\n", |
| 166 | + " \"Max\": 1.0,\r\n", |
| 167 | + " \"LogBase\": true,\r\n", |
| 168 | + " \"FeatureSpaceDim\": 1,\r\n", |
| 169 | + " \"Step\": [\r\n", |
| 170 | + " null\r\n", |
| 171 | + " ],\r\n", |
| 172 | + " \"Default\": [\r\n", |
| 173 | + " 1.0\r\n", |
| 174 | + " ]\r\n", |
| 175 | + " },\r\n", |
| 176 | + " \"NumberOfTrees\": {\r\n", |
| 177 | + " \"Min\": 4.0,\r\n", |
| 178 | + " \"Max\": 32768.0,\r\n", |
| 179 | + " \"LogBase\": true,\r\n", |
| 180 | + " \"FeatureSpaceDim\": 1,\r\n", |
| 181 | + " \"Step\": [\r\n", |
| 182 | + " null\r\n", |
| 183 | + " ],\r\n", |
| 184 | + " \"Default\": [\r\n", |
| 185 | + " 0.0\r\n", |
| 186 | + " ]\r\n", |
| 187 | + " },\r\n", |
| 188 | + " \"SubsampleFraction\": {\r\n", |
| 189 | + " \"Min\": 2E-10,\r\n", |
| 190 | + " \"Max\": 1.0,\r\n", |
| 191 | + " \"LogBase\": true,\r\n", |
| 192 | + " \"FeatureSpaceDim\": 1,\r\n", |
| 193 | + " \"Step\": [\r\n", |
| 194 | + " null\r\n", |
| 195 | + " ],\r\n", |
| 196 | + " \"Default\": [\r\n", |
| 197 | + " 1.0\r\n", |
| 198 | + " ]\r\n", |
| 199 | + " },\r\n", |
| 200 | + " \"MaximumBinCountPerFeature\": {\r\n", |
| 201 | + " \"Min\": 8.0,\r\n", |
| 202 | + " \"Max\": 1024.0,\r\n", |
| 203 | + " \"LogBase\": true,\r\n", |
| 204 | + " \"FeatureSpaceDim\": 1,\r\n", |
| 205 | + " \"Step\": [\r\n", |
| 206 | + " null\r\n", |
| 207 | + " ],\r\n", |
| 208 | + " \"Default\": [\r\n", |
| 209 | + " 0.7142857142857141\r\n", |
| 210 | + " ]\r\n", |
| 211 | + " },\r\n", |
| 212 | + " \"FeatureFraction\": {\r\n", |
| 213 | + " \"Min\": 2E-10,\r\n", |
| 214 | + " \"Max\": 1.0,\r\n", |
| 215 | + " \"LogBase\": false,\r\n", |
| 216 | + " \"FeatureSpaceDim\": 1,\r\n", |
| 217 | + " \"Step\": [\r\n", |
| 218 | + " null\r\n", |
| 219 | + " ],\r\n", |
| 220 | + " \"Default\": [\r\n", |
| 221 | + " 1.0\r\n", |
| 222 | + " ]\r\n", |
| 223 | + " },\r\n", |
| 224 | + " \"L1Regularization\": {\r\n", |
| 225 | + " \"Min\": 2E-10,\r\n", |
| 226 | + " \"Max\": 1.0,\r\n", |
| 227 | + " \"LogBase\": true,\r\n", |
| 228 | + " \"FeatureSpaceDim\": 1,\r\n", |
| 229 | + " \"Step\": [\r\n", |
| 230 | + " null\r\n", |
| 231 | + " ],\r\n", |
| 232 | + " \"Default\": [\r\n", |
| 233 | + " 0.0\r\n", |
| 234 | + " ]\r\n", |
| 235 | + " },\r\n", |
| 236 | + " \"L2Regularization\": {\r\n", |
| 237 | + " \"Min\": 2E-10,\r\n", |
| 238 | + " \"Max\": 1.0,\r\n", |
| 239 | + " \"LogBase\": true,\r\n", |
| 240 | + " \"FeatureSpaceDim\": 1,\r\n", |
| 241 | + " \"Step\": [\r\n", |
| 242 | + " null\r\n", |
| 243 | + " ],\r\n", |
| 244 | + " \"Default\": [\r\n", |
| 245 | + " 1.0\r\n", |
| 246 | + " ]\r\n", |
| 247 | + " }\r\n", |
| 248 | + "}" |
| 249 | + ] |
| 250 | + }, |
| 251 | + "metadata": {}, |
| 252 | + "output_type": "display_data" |
| 253 | + } |
| 254 | + ], |
110 | 255 | "source": [ |
111 | | - "# Available options in search space.\n", |
112 | | - "AutoML.Net search space supports multiple options which should be enough to cover most of usage cases. In summary, it supports\n", |
113 | | - "- numeric option - an option that is numeric type, like float, double, int...\n", |
114 | | - "- choice option - an option that is descrete, like string or boolean.\n", |
115 | | - "- nested option - an option that itself is also a search space.\n", |
| 256 | + "using Microsoft.ML.AutoML.CodeGen;\n", |
| 257 | + "var lgbmSearchSpace = new SearchSpace<LgbmOption>();\n", |
| 258 | + "\n", |
| 259 | + "// refine search space if necessary\n", |
| 260 | + "lgbmSearchSpace[\"NumberOfLeaves\"] = new UniformIntOption(4, 1024, true, 4);\n", |
116 | 261 | "\n", |
117 | | - "Underlying, a search space is no more than a json object, where key is option name and value is its value as another json object. This is also how search space supports nested search space. In general, there's no strong limitation on the type of option as long as it can be saved as json, but in practice, it's better to use primitive type since it's well tested.\n", |
| 262 | + "// pass lgbmSearchSpace in AutoTrainer\n", |
| 263 | + "var context = new MLContext();\n", |
| 264 | + "var pipeline = context.Auto().BinaryClassification(lgbmSearchSpace: lgbmSearchSpace);\n", |
118 | 265 | "\n", |
119 | | - "Once after you create a search space, a n-dimension linear space will be associated with that search space where `n` depends on the # of options and dimension of that option. During hpo, tuner will sample on that n-dimension linear space instead of original options. This feature makes option being transparent to tuner and greatly simplify the implementation of tuner." |
| 266 | + "JsonConvert.SerializeObject(lgbmSearchSpace, Formatting.Indented)" |
120 | 267 | ] |
121 | 268 | }, |
122 | 269 | { |
123 | 270 | "cell_type": "markdown", |
124 | | - "metadata": {}, |
| 271 | + "metadata": { |
| 272 | + "dotnet_interactive": { |
| 273 | + "language": "csharp" |
| 274 | + } |
| 275 | + }, |
125 | 276 | "source": [ |
126 | | - "# create search space from scratch\n", |
127 | | - "The following code shows how to create a search space from scratch and print it out as a json string." |
| 277 | + "# Create `SearchSpace` from scratch.\n", |
| 278 | + "The following code shows how to create a `SearchSpace` that contains numeric, choice and nested options, where\n", |
| 279 | + "- numeric option is an option that is numeric type, like float, double, int...\n", |
| 280 | + "- choice option is an option that is descrete, like string or boolean.\n", |
| 281 | + "- nested option is an option that itself is also a search space.\n" |
128 | 282 | ] |
129 | 283 | }, |
130 | 284 | { |
|
139 | 293 | } |
140 | 294 | }, |
141 | 295 | "outputs": [ |
142 | | - { |
143 | | - "name": "stdout", |
144 | | - "output_type": "stream", |
145 | | - "text": [ |
146 | | - "search space dimension: 6\r\n" |
147 | | - ] |
148 | | - }, |
149 | 296 | { |
150 | 297 | "data": { |
151 | 298 | "text/plain": [ |
|
253 | 400 | "nestedSearchSpace[\"IntOption\"] = new UniformIntOption(-10, 10, false, 0);\n", |
254 | 401 | "searchSpace[\"Nest\"] = nestedSearchSpace;\n", |
255 | 402 | "\n", |
256 | | - "// check out search space's dimension\n", |
257 | | - "Console.WriteLine(\"search space dimension: \" + searchSpace.FeatureSpaceDim);\n", |
258 | 403 | "// pretty print search space\n", |
259 | 404 | "JsonConvert.SerializeObject(searchSpace, Formatting.Indented)" |
260 | 405 | ] |
|
263 | 408 | "cell_type": "markdown", |
264 | 409 | "metadata": {}, |
265 | 410 | "source": [ |
266 | | - "# create search space from attribution\n", |
267 | | - "AutoML allows you to use attribution on property to avoid creating search space from scratch. The following code shows how to create an identical search space from above except using attribution API" |
| 411 | + "# create `SearchSpace<T>` which casts sampled parameter to a concrete type.\n", |
| 412 | + "AutoML allows you to use attribution on property to avoid creating search space from scratch. The following code shows how to create an identical search space from above except using attribution API\n", |
| 413 | + "\n", |
| 414 | + "`SearchSpace<T>`, comparing with `SearchSpace`, will also cast sampled parameter to a concrete type `T` instead of `Parameter`, which saves the effort of getting specific hyper-parameter value from `Parameter`." |
268 | 415 | ] |
269 | 416 | }, |
270 | 417 | { |
|
279 | 426 | } |
280 | 427 | }, |
281 | 428 | "outputs": [ |
282 | | - { |
283 | | - "name": "stdout", |
284 | | - "output_type": "stream", |
285 | | - "text": [ |
286 | | - "search space dimension: 6\r\n" |
287 | | - ] |
288 | | - }, |
289 | 429 | { |
290 | 430 | "data": { |
291 | 431 | "text/plain": [ |
|
404 | 544 | "\n", |
405 | 545 | "var searchSpace = new SearchSpace<MyParameter>();\n", |
406 | 546 | "\n", |
407 | | - "// check out search space's dimension\n", |
408 | | - "Console.WriteLine(\"search space dimension: \" + searchSpace.FeatureSpaceDim);\n", |
409 | 547 | "// pretty print search space\n", |
410 | 548 | "JsonConvert.SerializeObject(searchSpace, Formatting.Indented)" |
411 | 549 | ] |
|
414 | 552 | "cell_type": "markdown", |
415 | 553 | "metadata": {}, |
416 | 554 | "source": [ |
417 | | - "# Sampling from search space\n", |
418 | | - "In HPO, what tuner does is basically sampling from search space, and pass the sampling result, a.k.a `parameter`, to trial runner. The way of how parameter sampled is what make tuners different from each other and is critial to the final optimizing result. The common tunning algorithems are random search, grid search, smac, eci-cfo and many others.\n", |
| 555 | + "# Sampling `Parameter` from `SearchSpace`\n", |
| 556 | + "In AutoML.Net, `SearchSpace` is associated with an `n`-dimension vector which is called `feature space`. And tuner, instead of performing sampling on original options in a `SearchSpace`, sampling from that `feature space` instead. And `SearchSpace` will then mapping the sampling result back to its options.\n", |
419 | 557 | "\n", |
420 | | - "The following example shows how to sample from the given search space using random search, which sampling from the linear space and mapping it back to the original options." |
| 558 | + "The following code shows the sampling process above, which creates an `n`-d random vector and use that vector to sample from search space. Notice that because the type of created search space is `SearchSpace`, the result sampled from `n`-d vector is `Parameter` rather than a concrete type." |
421 | 559 | ] |
422 | 560 | }, |
423 | 561 | { |
|
436 | 574 | "name": "stdout", |
437 | 575 | "output_type": "stream", |
438 | 576 | "text": [ |
439 | | - "0 - {\"IntOption\":9,\"SingleOption\":7.392224,\"DoubleOption\":0.0,\"BoolOption\":true,\"StrOption\":\"b\",\"Nest\":{\"IntOption\":3}}\r\n", |
440 | | - "0 - {\"IntOption\":9,\"SingleOption\":7.392224,\"DoubleOption\":0.0,\"BoolOption\":false,\"StrOption\":\"b\",\"Nest\":{\"IntOption\":3}}\r\n", |
441 | | - "1 - {\"IntOption\":-6,\"SingleOption\":3.4464028,\"DoubleOption\":-7.0,\"BoolOption\":false,\"StrOption\":\"a\",\"Nest\":{\"IntOption\":-5}}\r\n", |
442 | | - "1 - {\"IntOption\":-6,\"SingleOption\":3.4464028,\"DoubleOption\":-7.0,\"BoolOption\":false,\"StrOption\":\"a\",\"Nest\":{\"IntOption\":-5}}\r\n", |
443 | | - "2 - {\"IntOption\":-9,\"SingleOption\":8.262646,\"DoubleOption\":-7.0,\"BoolOption\":true,\"StrOption\":\"c\",\"Nest\":{\"IntOption\":2}}\r\n", |
444 | | - "2 - {\"IntOption\":-9,\"SingleOption\":8.262646,\"DoubleOption\":-7.0,\"BoolOption\":false,\"StrOption\":\"c\",\"Nest\":{\"IntOption\":2}}\r\n" |
| 577 | + "0 - {\"IntOption\":5,\"SingleOption\":1.3184657,\"DoubleOption\":2.0,\"BoolOption\":true,\"StrOption\":\"b\",\"Nest\":{\"IntOption\":4}}\r\n", |
| 578 | + "0 - {\"IntOption\":5,\"SingleOption\":1.3184657,\"DoubleOption\":2.0,\"BoolOption\":false,\"StrOption\":\"b\",\"Nest\":{\"IntOption\":4}}\r\n", |
| 579 | + "1 - {\"IntOption\":4,\"SingleOption\":1.8871154,\"DoubleOption\":-7.0,\"BoolOption\":true,\"StrOption\":\"b\",\"Nest\":{\"IntOption\":1}}\r\n", |
| 580 | + "1 - {\"IntOption\":4,\"SingleOption\":1.8871154,\"DoubleOption\":-7.0,\"BoolOption\":false,\"StrOption\":\"b\",\"Nest\":{\"IntOption\":1}}\r\n", |
| 581 | + "2 - {\"IntOption\":-2,\"SingleOption\":5.2511787,\"DoubleOption\":5.0,\"BoolOption\":false,\"StrOption\":\"b\",\"Nest\":{\"IntOption\":6}}\r\n", |
| 582 | + "2 - {\"IntOption\":-2,\"SingleOption\":5.2511787,\"DoubleOption\":5.0,\"BoolOption\":false,\"StrOption\":\"b\",\"Nest\":{\"IntOption\":6}}\r\n" |
445 | 583 | ] |
446 | 584 | } |
447 | 585 | ], |
|
482 | 620 | "}" |
483 | 621 | ] |
484 | 622 | }, |
485 | | - { |
486 | | - "cell_type": "markdown", |
487 | | - "metadata": {}, |
488 | | - "source": [ |
489 | | - "# Default search space for ml.net trainer\n", |
490 | | - "AutoML.Net comes with a series of default search space for most of ml.net trainers. You can check it under `Microsoft.ML.AutoML.CodeGen` namespace. The following code shows the default search space for LightGbm." |
491 | | - ] |
492 | | - }, |
493 | | - { |
494 | | - "cell_type": "code", |
495 | | - "execution_count": null, |
496 | | - "metadata": { |
497 | | - "dotnet_interactive": { |
498 | | - "language": "csharp" |
499 | | - }, |
500 | | - "vscode": { |
501 | | - "languageId": "dotnet-interactive.csharp" |
502 | | - } |
503 | | - }, |
504 | | - "outputs": [], |
505 | | - "source": [ |
506 | | - "using Microsoft.ML.AutoML.CodeGen;\n", |
507 | | - "var lgbmSearchSpace = new SearchSpace<LgbmOption>();\n", |
508 | | - "JsonConvert.SerializeObject(lgbmSearchSpace, Formatting.Indented)" |
509 | | - ] |
510 | | - }, |
511 | 623 | { |
512 | 624 | "cell_type": "markdown", |
513 | 625 | "metadata": {}, |
|
0 commit comments