|
3 | 3 | Tutorial for testing the MCAR case |
4 | 4 | ============================================ |
5 | 5 |
|
6 | | -In this tutorial, we show how to use the mcar test classe and it methods |
| 6 | +In this tutorial, we show how to use the mcar test class and its methods. |
7 | 7 |
|
8 | | -Keep in my mind that, at this moment, the mcar tests are only handle tabular data. |
| 8 | +Keep in my mind that, at this moment, the mcar tests only handle tabular data. |
9 | 9 | """ |
10 | 10 | # %% |
11 | 11 | # First import some libraries |
|
33 | 33 | # missing patterns and won't be efficient to detect the heterogeneity of covariance between missing |
34 | 34 | # patterns. |
35 | 35 | # |
| 36 | +# The null hypothesis, H0, is : "The data are MCAR". Against, |
| 37 | +# The alternative hypothesis : " The data are not MCAR, the means of the observed variables can |
| 38 | +# vary across the patterns" |
| 39 | +# |
| 40 | +# We choose to use the classic threshold, equal to 5%. If the test pval is below this threshold, |
| 41 | +# we reject the null hypothesis. |
| 42 | +# |
36 | 43 | # This notebook shows how the Little's test performs and its limitations. |
37 | 44 |
|
38 | 45 | np.random.seed(11) |
|
43 | 50 | # Case 1 : Normal iid feature with MCAR holes |
44 | 51 | # =========================================== |
45 | 52 |
|
46 | | -matrix = np.random.multivariate_normal(mean=[0, 0], cov=[[1, 0], [0, 1]], size=100) |
| 53 | +matrix = np.random.multivariate_normal(mean=[0, 0], cov=[[1, 0], [0, 1]], size=200) |
47 | 54 | matrix.ravel()[np.random.choice(matrix.size, size=20, replace=False)] = np.nan |
48 | 55 | matrix_masked = matrix[np.argwhere(np.isnan(matrix))] |
49 | 56 | df_1 = pd.DataFrame(matrix) |
|
53 | 60 |
|
54 | 61 | plt.legend( |
55 | 62 | (plt_1, plt_2), |
56 | | - ("observed_values", "masked_vlues"), |
| 63 | + ("observed_values", "masked_values"), |
57 | 64 | scatterpoints=1, |
58 | 65 | loc="lower left", |
59 | 66 | ncol=1, |
|
78 | 85 | # ========================================== |
79 | 86 | np.random.seed(11) |
80 | 87 |
|
81 | | -matrix = np.random.multivariate_normal(mean=[0, 0], cov=[[1, 0], [0, 1]], size=100) |
| 88 | +matrix = np.random.multivariate_normal(mean=[0, 0], cov=[[1, 0], [0, 1]], size=200) |
82 | 89 | threshold = random.uniform(0, 1) |
83 | | -matrix[np.argwhere(matrix[:, 0] > 1.96), 1] = np.nan |
| 90 | +matrix[np.argwhere(matrix[:, 0] >= 1.96), 1] = np.nan |
84 | 91 | matrix_masked = matrix[np.argwhere(np.isnan(matrix))] |
85 | 92 | df_2 = pd.DataFrame(matrix) |
86 | 93 |
|
|
118 | 125 |
|
119 | 126 | np.random.seed(11) |
120 | 127 |
|
121 | | -matrix = np.random.multivariate_normal(mean=[0, 0], cov=[[1, 0], [0, 1]], size=100) |
122 | | -matrix[np.argwhere(abs(matrix[:, 0]) >= 1.95), 1] = np.nan |
| 128 | +matrix = np.random.multivariate_normal(mean=[0, 0], cov=[[1, 0], [0, 1]], size=200) |
| 129 | +matrix[np.argwhere(abs(matrix[:, 0]) >= 1.96), 1] = np.nan |
123 | 130 | matrix_masked = matrix[np.argwhere(np.isnan(matrix))] |
124 | 131 | df_3 = pd.DataFrame(matrix) |
125 | 132 |
|
|
0 commit comments