You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/analysis.rst
+13-13Lines changed: 13 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,22 +1,22 @@
1
1
2
2
Analysis
3
3
========
4
-
The analysis section gives a better understanding of the holes in a dataset.
4
+
This section gives a better understanding of the holes in a dataset.
5
5
6
6
1. General approach
7
7
-------------------
8
8
9
9
As described in section :ref:`hole_generator`, there are 3 main types of missing data mechanism: MCAR, MAR and MNAR.
10
-
The analysis brick provides tools to charaterize the type of holes.
10
+
The analysis module provides tools to characterize the type of holes.
11
11
12
-
The MNAR case is the trickiest, the user must first consider whether or not his missing data mechanism is MNAR. In the meantime, we make the assumption that the missing-data mechanism is ignorable (ieis not MNAR). If the MNAR missing data mechanism is suspected, please see this article :ref:`An approach to test for MNAR [1]<Noonan-article>`.
12
+
The MNAR case is the trickiest, the user must first consider whether their missing data mechanism is MNAR. In the meantime, we make assume that the missing-data mechanism is ignorable (ie., it is not MNAR). If an MNAR mechanism is suspected, please see this article :ref:`An approach to test for MNAR [1]<Noonan-article>` for relevant actions.
13
13
14
14
Then Qolmat proposes a test to determine whether the missing data mechanism is MCAR or MAR.
15
15
16
-
2. How to use the results ?
17
-
---------------------------
16
+
2. How to use the results
17
+
-------------------------
18
18
19
-
At the end of the MCAR test, it can then be assumed whether the missing data mechanism is MCAR or not. This could be used for several things :
19
+
At the end of the MCAR test, it can then be assumed whether the missing data mechanism is MCAR or not. This serves three differents purposes:
20
20
21
21
a. Diagnosis
22
22
^^^^^^^^^^^^
@@ -27,30 +27,30 @@ The test result can then be used for continuous data quality management.
27
27
b. Estimation
28
28
^^^^^^^^^^^^^
29
29
30
-
Some estimation methods are not suitable for the MAR case. For example, dropingn the nans introduces bias into the estimator, it is necessary to have validated that the missing-data mechanism is MCAR.
30
+
Some estimation methods are not suitable for the MAR case. For example, dropping the nans introduces bias into the estimator, it is necessary to have validated that the missing-data mechanism is MCAR.
31
31
32
32
c. Imputation
33
33
^^^^^^^^^^^^^
34
34
35
-
Qolmat allows model selection imputation algorithms. For each of the K folds, Qolmat artificially masks a set of observed values using a default or userspecified hole generator. It seems natural to create these masks according to the same missing-data mechanism as dtermined by the test. Here's the documentation on using Qolmat for imputation model selection. : `here<https://qolmat.readthedocs.io/en/latest/#:~:text=How%20does%20Qolmat%20work%20%3F>`_.
35
+
Qolmat allows model selection imputation algorithms. For each of the K folds, Qolmat artificially masks a set of observed values using a default or user-specified hole generator. It seems natural to create these masks according to the same missing-data mechanism as determined by the test. Here is the documentation on using Qolmat for imputation `model selection <https://qolmat.readthedocs.io/en/latest/#:~:text=How%20does%20Qolmat%20work%20%3F>`_.
36
36
37
37
3. The MCAR Tests
38
38
-----------------
39
39
40
-
There exist several statistical tests to determine if the missing data mechanism is MCAR or MAR. Most tests are based on the notion of missing pattern.
41
-
A missing pattern, also called pattern, is the structure of observed and missing values in a dataset. For example, for a dataset with 2 columns, the possible patterns are: (0, 0), (1, 0), (0, 1), (1, 1). The value 1 indicates that the value in the column is missing.
40
+
There are several statistical tests to determine if the missing data mechanism is MCAR or MAR. Most tests are based on the notion of missing pattern.
41
+
A missing pattern, also called a pattern, is the structure of observed and missing values in a dataset. For example, for a dataset with two columns, the possible patterns are: (0, 0), (1, 0), (0, 1), (1, 1). The value 1 indicates that the value in the column is missing.
42
42
43
43
The MCAR missing-data mechanism means that there is independence between the presence of holes and the observed values. In other words, the data distribution is the same for all patterns.
44
44
45
45
a. Little's Test
46
46
^^^^^^^^^^^^^^^^
47
47
48
-
The best-known MCAR test is the :ref:`Little [2]<Little-article>` test. Keep in mind that the Little's test is designed to test the homogeneity of means accross the missing patterns and won't be efficient to detect the heterogeneity of covariance accross missing patterns.
48
+
The best-known MCAR test is the :ref:`Little [2]<Little-article>` test, and it has been implemented in :class:`LittleTest`. Keep in mind that the Little's test is designed to test the homogeneity of means across the missing patterns and won't be efficient to detect the heterogeneity of covariance accross missing patterns.
49
49
50
50
b. PKLM Test
51
51
^^^^^^^^^^^^
52
52
53
-
The :ref:`PKLM [2]<PKLM-article>` (Projected Kullback-Leibler MCAR) test compares the distributions of different missing patterns on random projections in the variable space of the data. This recent test applies to mixed-type data.
53
+
The :ref:`PKLM [2]<PKLM-article>` (Projected Kullback-Leibler MCAR) test compares the distributions of different missing patterns on random projections in the variable space of the data. This recent test applies to mixed-type data. It is not implemented yet in Qolmat.
54
54
55
55
References
56
56
----------
@@ -61,7 +61,7 @@ References
61
61
62
62
.. _Little-article:
63
63
64
-
[2] Little. `A Test of Missing Completely at Random for Multivariate Data with Missing Values. <https://www.tandfonline.com/doi/abs/10.1080/01621459.1988.10478722>`_ Journal of the American Statistical Association, Volume 83, 1988 - Issue 404.
64
+
[2] Little, R. J. A. `A Test of Missing Completely at Random for Multivariate Data with Missing Values. <https://www.tandfonline.com/doi/abs/10.1080/01621459.1988.10478722>`_ Journal of the American Statistical Association, Volume 83, 1988 - Issue 404.
0 commit comments