|
42 | 42 | "ML.NET provides a variety of trainers. You can find most of them under the [StandardTrainersCatalog](https://docs.microsoft.com/dotnet/api/microsoft.ml.standardtrainerscatalog?view=ml-dotnet). Examples of trainers include linear trainers like `SDCA`, `Lbfgs`, `LinearSvm` and tree-based non-linear trainers like `FastTree`, `RandomForest` and `LightGbm`. Generally, each trainer's capability is different. Non-linear models sometimes have better training performance (lower loss) than linear ones, but it doesn't always mean they are always the better choice. Picking the right trainer to build the best model for your data requires many attempts of trial and error.\n", |
43 | 43 | "\n", |
44 | 44 | "### Hyper-parameter optimization\n", |
45 | | - "Other than difference in trainers, different hyper-parameter in one trainer also have a huge impact over the final training performance, especially for tree-base trainers. This is because the capability of these trainers to fit a specific dataset is largly depends on their hyper parameters. For example, larger `numberOfLeaves` in `LightGbm` results to a larger model and usually enable it to fit on a more complex dataset, but it might have countereffect on small dataset and cause overfitting. On the contrary, if the dataset is complex but you set a small `numberOfLeaves`, it might impair `LightGbm`'s ability on fitting that dataset and cause underfit.\n", |
| 45 | + "Choosing the right trainer impacts your final training performance. Choosing the right hyper-parameters also has a huge impact over the final training performance, especially for tree-base trainers. A hyper-parameter is a parameter set prior to training to help guide the training process and assist the algorithm in estimating the function that best fits your data. Hyper-parameters are important because the ability of these trainers to fit a specific dataset is largely depends on their hyper parameters. For example, larger `numberOfLeaves` in `LightGbm` produces a larger model and usually enables it to fit on a more complex dataset, but it might have countereffect on small dataset and cause **overfitting**. Conversely, if the dataset is complex but you set a small `numberOfLeaves`, it might impair `LightGbm`'s ability on fitting that dataset and cause **underfit**.\n", |
46 | 46 | "\n", |
47 | | - "In practice, it's usually tedious while necessary to try different set of hyper-parameters and find the best configuration for trainer, this process is called hyper-parameter optimization (HPO). Luckily, you can use the built-in `AutoML` to help you on hpo process.\n", |
| 47 | + "The process of finding the best configuration for your trainer is known as hyper-parameter optimization (HPO). Like the process of choosing your trainer it involves a lot of trial and error. The built-in Automated ML (AutoML) capabilities in ML.NET simplify the HPO process.\n", |
48 | 48 | "\n", |
49 | | - "### OverFitting and UnderFitting\n", |
50 | | - "Overfitting and underfitting are the two most common problems we would see when training a model. UnderFitting means the selected trainer is not capable enough to fit training dataset and usually result in a high loss during training and low score/metric on test dataset. To resolve this we need either select a more powerful model, or do more feature engineering. And overfitting is just the opposite, which happens when model get overtrained and usually result in a decent low loss during training but low score on test dataset.\n", |
| 49 | + "### Overfitting and Underfitting\n", |
| 50 | + "Overfitting and underfitting are the two most common problems you encounter when training a model. Underfitting means the selected trainer is not capable enough to fit training dataset and usually result in a high loss during training and low score/metric on test dataset. To resolve this you need to either select a more powerful model or perform more feature engineering. Overfitting is the opposite, which happens when model learns the training data too well. This usually results in low loss metric during training but high loss on test dataset.\n". |
| 51 | + |
| 52 | + A good analogy for these concepts is studying for an exam. Let's say you knew the questions and answers ahead of time. After studying, you take the test and get a perfect score. Great news! However, when you're given the exam again with the questions rearranged and with slightly different wording you get a lower score. That suggests you memorized the answers and didn't actually learn the concepts you were being tested on. This is an example of overfitting. Underfitting is the opposite where the study materials you were given don't accurately represent what you're evaluated on for the exam. As a result, you resort to guessing the answers since you don't have enough knowledge to answer correctly. |
51 | 53 | "\n", |
52 | | - "In the next section, we will go through two examples. The first example performs regression training on a linear dataset using both simple, linear and more advanced, non-linear trainers. And is to illustrate the importance of selecting the __Right__ trainer instead of __Advanced__ trainer. The second example performs regression training, while on a non-linear dataset, using both `LightGbm` with difference hyper-parameters. This is to show the importance of hyper-parameter optimization during training a model." |
| 54 | + "In the next section, we will go through two examples. The first example trains a regression model on a linear dataset using both linear and more advanced non-linear trainers to highlight the importance of selecting the right trainer. The second example trains a regression model on a non-linear dataset using `LightGbm` with different hyper-parameters to show the importance of hyper-parameter optimization" |
53 | 55 | ] |
54 | 56 | }, |
55 | 57 | { |
|
0 commit comments