Merge pull request #114 from Quantmetry/doc_cosmetic_changes

JulienRoussel77 · web-flow · commit f0a9239f22d4 · 2023-10-31T12:12:21.000+01:00
Doc cosmetic changes
diff --git a/README.rst b/README.rst
@@ -47,7 +47,7 @@ Qolmat can be installed in different ways:
 .. code:: sh
 
     $ pip install qolmat  # installation via `pip`
-    $ pip install qolmat[pytorch] # if you need pytorch
+    $ pip install qolmat[pytorch] # if you need ImputerDiffusion relying on pytorch
     $ pip install git+https://github.com/Quantmetry/qolmat  # or directly from the github repository
 
 ⚡️ Quickstart
@@ -105,8 +105,8 @@ The full documentation can be found `on this link <https://qolmat.readthedocs.io
 
 **How does Qolmat work ?**
 
-Qolmat allows model selection for scikit-learn compatible imputation algorithms, by performing three steps pictured below:
-1) For each of the K folds, Qolmat artificially masks a set of observed values using a default or user specified `hole generator <explanation.html#hole-generator>`_,
+| Qolmat allows model selection for scikit-learn compatible imputation algorithms, by performing three steps pictured below:
+1) For each of the K folds, Qolmat artificially masks a set of observed values using a default or user specified `hole generator <explanation.html#hole-generator>`_.
 2) For each fold and each compared `imputation method <imputers.html>`_, Qolmat fills both the missing and the masked values, then computes each of the default or user specified `performance metrics <explanation.html#metrics>`_.
 3) For each compared imputer, Qolmat pools the computed metrics from the K folds into a single value.
 
@@ -117,7 +117,7 @@ This is very similar in spirit to the `cross_val_score <https://scikit-learn.org
 
 **Imputation methods**
 
-The following table contains the available imputation methods. We distinguish single imputation methods (aiming for pointwise accuracy, mostly deterministic) from multiple imputation methods (aiming for distribution similarity, mostly stochastic).
+The following table contains the available imputation methods. We distinguish single imputation methods (aiming for pointwise accuracy, mostly deterministic) from multiple imputation methods (aiming for distribution similarity, mostly stochastic). For further details regarding the distinction between single and multiple imputation, you can refer to the `Imputation article <https://en.wikipedia.org/wiki/Imputation_(statistics)>`_ on Wikipedia.
 
 .. list-table::
    :widths: 25 70 15 15
diff --git a/docs/api.rst b/docs/api.rst
@@ -103,4 +103,14 @@ Diffusion engine
     
     imputations.imputers_pytorch.ImputerDiffusion
     imputations.diffusions.ddpms.TabDDPM
-    imputations.diffusions.ddpms.TsDDPM
+    imputations.diffusions.ddpms.TsDDPM
+
+
+Utils
+================
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+    
+    utils.data.add_holes
diff --git a/docs/explanation.rst b/docs/explanation.rst
@@ -99,7 +99,7 @@ We compute the associated complete dataset :math:`\hat{X}^{(k)}` for the partial
 -----------------
 
 Evaluating the imputers requires to generate holes that are representative of the holes at hand.
-The missingness mechanisms have been classified by Rubin [1] into MCAR, MAR and MNAR.
+The missingness mechanisms have been classified by :ref:`Rubin [1]<rubin-article>` into MCAR, MAR and MNAR.
 
 Suppose we have :math:`X_{obs}`, a subset of a complete data model :math:`X = (X_{obs}, X_{mis})`, which is not fully observable (:math:`X_{mis}` is the missing part).
 We define the matrix :math:`M` such that :math:`M_{ij}=1` if :math:`X_{ij}` is missing, and 0 otherwise, and we assume distribution of :math:`M` is parametrised by :math:`\psi`.
@@ -108,14 +108,14 @@ The observations are said to be Missing Completely at Random (MCAR) if the proba
 Formally,
 
 .. math::
-    P(M | X_{obs}, X_{mis}, \psi) = P(M, \psi), \quad \forall \psi.
+    P(M | X_{obs}, X_{mis}, \psi) = P(M | \psi), \quad \forall \psi.
 
 The observations are said to be Missing at Random (MAR) if the probability of an observation to be missing only depends on the observations. Formally,
 
 .. math::
     P(M | X_{obs}, X_{mis}, \psi) = P(M | X_{obs}, \psi), \quad \forall \psi, X_{mis}.
 
-Finally, the observations are said to be Missing Not at Random (MNAR) in all other cases, i.e. if P(M | X_{obs}, X_{mis}, \psi) does not simplify.
+Finally, the observations are said to be Missing Not at Random (MNAR) in all other cases, i.e. if :math:`P(M | X_{obs}, X_{mis}, \psi)` does not simplify.
 
 Qolmat allows to generate new missing values on a an existing dataset, but only in the MCAR case.
 
@@ -140,4 +140,7 @@ Qolmat can be used to search for hyperparameters in imputation functions. Let sa
 
 References
 ----------
-[1] Rubin, Donald B. `Inference and missing data. <https://www.math.wsu.edu/faculty/xchen/stat115/lectureNotes3/Rubin%20Inference%20and%20Missing%20Data.pdf>`_ Biometrika 63.3 (1976): 581-592.
+
+.. _rubin-article:
+
+[1] Rubin, Donald B. `Inference and missing data. <https://www.math.wsu.edu/faculty/xchen/stat115/lectureNotes3/Rubin%20Inference%20and%20Missing%20Data.pdf>`_ Biometrika 63.3 (1976): 581-592.
diff --git a/examples/tutorials/plot_tuto_diffusion_models.py b/examples/tutorials/plot_tuto_diffusion_models.py
@@ -7,7 +7,6 @@
 and :class:`~qolmat.imputations.diffusions.ddpms.TsDDPM` classes.
 """
 
-# %%
 import pandas as pd
 import numpy as np
 import matplotlib.pyplot as plt
diff --git a/examples/tutorials/plot_tuto_mean_median.py b/examples/tutorials/plot_tuto_mean_median.py
@@ -1,6 +1,6 @@
 """
 ========================================================================================
-Tutorial for comparison between mean and median imputations with uniform hole generation
+Comparison of basic imputers
 ========================================================================================
 
 In this tutorial, we show how to use the Qolmat comparator
@@ -31,11 +31,10 @@
 # the 82nd column contains the critical temperature which is used as the
 # target variable.
 # The data does not contain missing values; so for the purpose of this notebook,
-# we corrupt the data, with the ``qolmat.utils.data.add_holes`` function.
+# we corrupt the data, with the :func:`qolmat.utils.data.add_holes` function.
 # In this way, each column has missing values.
 
-df_data = data.get_data("Superconductor")
-df = data.add_holes(df_data, ratio_masked=0.2, mean_size=120)
+df = data.add_holes(data.get_data("Superconductor"), ratio_masked=0.2, mean_size=120)
 
 # %%
 # The dataset contains 82 columns. For simplicity,
@@ -76,10 +75,6 @@
 imputer_median = imputers.ImputerMedian()
 dict_imputers = {"mean": imputer_mean, "median": imputer_median}
 
-generator_holes = missing_patterns.UniformHoleGenerator(
-    n_splits=2, subset=cols_to_impute, ratio_masked=0.1
-)
-
 metrics = ["mae", "wmape", "KL_columnwise"]
 
 # %%
@@ -88,9 +83,7 @@
 # (those previously mentioned),
 # a list with the columns names to impute,
 # a generator of holes specifying the type of holes to create.
-# Just a few words about hole generation.
 # in this example, we have chosen the uniform hole generator.
-# You can see what this looks like.
 # For example, by imposing that 10% of missing data be created
 # ``ratio_masked=0.1`` and creating missing values in columns
 # ``subset=cols_to_impute``: