|
1 | 1 | """ |
2 | 2 | ======================================================================================== |
3 | | -Tutorial for comparison between mean and median imputations with uniform hole generation |
| 3 | +Comparison of basic imputers |
4 | 4 | ======================================================================================== |
5 | 5 |
|
6 | 6 | In this tutorial, we show how to use the Qolmat comparator |
|
31 | 31 | # the 82nd column contains the critical temperature which is used as the |
32 | 32 | # target variable. |
33 | 33 | # The data does not contain missing values; so for the purpose of this notebook, |
34 | | -# we corrupt the data, with the ``qolmat.utils.data.add_holes`` function. |
| 34 | +# we corrupt the data, with the :func:`qolmat.utils.data.add_holes` function. |
35 | 35 | # In this way, each column has missing values. |
36 | 36 |
|
37 | | -df_data = data.get_data("Superconductor") |
38 | | -df = data.add_holes(df_data, ratio_masked=0.2, mean_size=120) |
| 37 | +df = data.add_holes(data.get_data("Superconductor"), ratio_masked=0.2, mean_size=120) |
39 | 38 |
|
40 | 39 | # %% |
41 | 40 | # The dataset contains 82 columns. For simplicity, |
|
76 | 75 | imputer_median = imputers.ImputerMedian() |
77 | 76 | dict_imputers = {"mean": imputer_mean, "median": imputer_median} |
78 | 77 |
|
79 | | -generator_holes = missing_patterns.UniformHoleGenerator( |
80 | | - n_splits=2, subset=cols_to_impute, ratio_masked=0.1 |
81 | | -) |
82 | | - |
83 | 78 | metrics = ["mae", "wmape", "KL_columnwise"] |
84 | 79 |
|
85 | 80 | # %% |
|
88 | 83 | # (those previously mentioned), |
89 | 84 | # a list with the columns names to impute, |
90 | 85 | # a generator of holes specifying the type of holes to create. |
91 | | -# Just a few words about hole generation. |
92 | 86 | # in this example, we have chosen the uniform hole generator. |
93 | | -# You can see what this looks like. |
94 | 87 | # For example, by imposing that 10% of missing data be created |
95 | 88 | # ``ratio_masked=0.1`` and creating missing values in columns |
96 | 89 | # ``subset=cols_to_impute``: |
|
0 commit comments