scikit-learn-contrib · jsoref · Nov 16, 2025 · Nov 16, 2025 · Nov 16, 2025 · Nov 16, 2025
diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
@@ -46,7 +46,7 @@ Documenting your change
 -----------------------
 
 If you're adding a class or a function, then you'll need to add a docstring with a doctest. We follow the `numpy docstring convention <https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html>`_, so please do too.
-Any estimator should follow the [scikit-learn API](https://scikit-learn.org/stable/developers/develop.html), so please follow these guidelines.
+Any estimator should follow the `scikit-learn API <https://scikit-learn.org/stable/developers/develop.html>`_, so please follow these guidelines.
 
 Updating changelog
 ------------------

diff --git a/HISTORY.rst b/HISTORY.rst
@@ -5,7 +5,7 @@ History
 0.1.10 (2024-??-??)
 ------------------
 * Long EM and RPCA operations wrapped with tqdm progress bars
-* Readme code sample updated, and results table made consistant
+* Readme code sample updated, and results table made consistent
 
 0.1.9 (2024-08-29)
 ------------------
@@ -41,7 +41,7 @@ History
 * RPCA algorithms now start with a normalizing scaler
 * The EM algorithms now include a gradient projection step to be more robust to colinearity
 * The EM algorithm based on the Gaussian model is now initialized using a robust estimation of the covariance matrix
-* A bug in the EM algorithm has been patched: the normalizing matrix gamma was creating a sampling biais
+* A bug in the EM algorithm has been patched: the normalizing matrix gamma was creating a sampling bias
 * Speed up of the EM algorithm likelihood maximization, using the conjugate gradient method
 * The ImputeRegressor class now handles the nans by `row` by default
 * The metric `frechet` was not correctly called and has been patched
@@ -67,9 +67,9 @@ History
 -------------------
 
 * VAR(p) EM sampler implemented, founding on a VAR(p) modelization such as the one described in `Lütkepohl (2005) New Introduction to Multiple Time Series Analysis`
-* EM and RPCA matrices transposed in the low-level impelmentation, however the API remains unchanged
+* EM and RPCA matrices transposed in the low-level implementation, however the API remains unchanged
 * Sparse matrices introduced in the RPCA implementation so as to speed up the execution
-* Implementation of SoftImpute, which provides a fast but less robust alterantive to RPCA
+* Implementation of SoftImpute, which provides a fast but less robust alternative to RPCA
 * Implementation of TabDDPM and TsDDPM, which are diffusion-based models for tabular data and time-series data, based on Denoising Diffusion Probabilistic Models. Their implementations follow the work of Tashiro et al., (2021) and Kotelnikov et al., (2023).
 * ImputerDiffusion is an imputer-wrapper of these two models TabDDPM and TsDDPM.
 * Docstrings and tests improved for the EM sampler
@@ -100,7 +100,7 @@ been changed into tuple attributes so that all are not immutable
 0.0.13 (2023-06-07)
 -------------------
 
-* Refacto cross validation
+* Refactor cross validation
 * Fix Readme
 * Add test utils.plot
 

diff --git a/docs/analysis.rst b/docs/analysis.rst
@@ -16,7 +16,7 @@ Then Qolmat proposes two tests to determine whether the missing data mechanism i
 2. How to use the results
 -------------------------
 
-At the end of the MCAR test, it can then be assumed whether the missing data mechanism is MCAR or not. This serves three differents purposes:
+At the end of the MCAR test, it can then be assumed whether or not the missing data mechanism is MCAR. This serves three different purposes:
 
 a. Diagnosis
 ^^^^^^^^^^^^
@@ -45,7 +45,7 @@ The MCAR missing-data mechanism means that there is independence between the pre
 a. Little's Test
 ^^^^^^^^^^^^^^^^
 
-The best-known MCAR test is the :ref:`Little [1]<Little-article>` test, and it has been implemented in :class:`LittleTest`. Keep in mind that the Little's test is designed to test the homogeneity of means across the missing patterns and won't be efficient to detect the heterogeneity of covariance accross missing patterns.
+The best-known MCAR test is the :ref:`Little [1]<Little-article>` test, and it has been implemented in :class:`LittleTest`. Keep in mind that the Little's test is designed to test the homogeneity of means across the missing patterns and won't be efficient to detect the heterogeneity of covariance across missing patterns.
 
 b. PKLM Test
 ^^^^^^^^^^^^

diff --git a/docs/explanation.rst b/docs/explanation.rst
@@ -117,7 +117,7 @@ The observations are said to be Missing at Random (MAR) if the probability of an
 
 Finally, the observations are said to be Missing Not at Random (MNAR) in all other cases, i.e. if :math:`P(M | X_{obs}, X_{mis}, \psi)` does not simplify.
 
-Qolmat allows to generate new missing values on a an existing dataset, but only in the MCAR case.
+Qolmat allows to generate new missing values on an existing dataset, but only in the MCAR case.
 
 Here are the different classes to generate missing data. We recommend the last 3 for time series.
 

diff --git a/docs/imputers.rst b/docs/imputers.rst
@@ -42,7 +42,7 @@ See the :class:`~qolmat.imputations.imputers.ImputerRpcaPcp` class for implement
 
 **Noisy RPCA** [2, 3, 4]
 
-The class :class:`RpcaNoisy` implements an recommanded improved version, which relies on a decomposition :math:`\mathbf{D} = \mathbf{M} + \mathbf{A} + \mathbf{E}`. The additionnal term encodes a Gaussian noise and makes the numerical convergence more reliable. This class also implements a time-consistency penalization for time series, parametrized by the :math:`\eta_k`and :math:`H_k`. By defining :math:`\Vert \mathbf{MH_k} \Vert_p` is either :math:`\Vert \mathbf{MH_k} \Vert_1` or  :math:`\Vert \mathbf{MH_k} \Vert_F^2`, the optimisation problem is the following
+The class :class:`RpcaNoisy` implements a recommended improved version, which relies on a decomposition :math:`\mathbf{D} = \mathbf{M} + \mathbf{A} + \mathbf{E}`. The additional term encodes a Gaussian noise and makes the numerical convergence more reliable. This class also implements a time-consistency penalization for time series, parametrized by the :math:`\eta_k`and :math:`H_k`. By defining :math:`\Vert \mathbf{MH_k} \Vert_p` is either :math:`\Vert \mathbf{MH_k} \Vert_1` or  :math:`\Vert \mathbf{MH_k} \Vert_F^2`, the optimisation problem is the following
 
 .. math::
    \text{min}_{\mathbf{M, A} \in \mathbb{R}^{m \times n}} \quad \frac 1 2 \Vert P_{\Omega} (\mathbf{D}-\mathbf{M}-\mathbf{A}) \Vert_F^2 + \tau \Vert \mathbf{M} \Vert_* + \lambda \Vert \mathbf{A} \Vert_1 + \sum_{k=1}^K \eta_k \Vert \mathbf{M H_k} \Vert_p
@@ -71,15 +71,15 @@ Suppose the data :math:`\mathbf{X}` has a density :math:`p_\theta` parametrized
 
 **Expectation**
 
-Draw samples of :math:`\mathbf{X}` assuming a fixed :math:`\theta`, conditionnaly on the values of :math:`\mathbf{X}_\mathrm{obs}`. This is done by MCMC using a projected Langevin algorithm.
+Draw samples of :math:`\mathbf{X}` assuming a fixed :math:`\theta`, conditionally on the values of :math:`\mathbf{X}_\mathrm{obs}`. This is done by MCMC using a projected Langevin algorithm.
 This process is characterized by a time step :math:`h`. Given an initial station :math:`X_0`, one can update the state at iteration *t* as
 
 .. math::
     \widetilde X_n = X_{n-1} + \Gamma \nabla L_X(X_{n-1}, \theta_n) (X_{n-1} - \mu) h + (2 h \Gamma)^{1/2} Z_n,
 
-where :math:`Z_n` is a vector of independant standard normal random variables and :math:`L` is the log-likelihood.
+where :math:`Z_n` is a vector of independent standard normal random variables and :math:`L` is the log-likelihood.
 The sampled distribution tends to the target one in the limit :math:`h \rightarrow 0` and the number of iterations :math:`n \rightarrow \infty`.
-Sampling from the conditionnal distribution :math:`p(\mathbf{X}_{mis} \vert \mathbf{X}_{obs} ; \theta^{(n)})` (see MCEM [6]) is achieved by projecting the samples at each step.
+Sampling from the conditional distribution :math:`p(\mathbf{X}_{mis} \vert \mathbf{X}_{obs} ; \theta^{(n)})` (see MCEM [6]) is achieved by projecting the samples at each step.
 
 .. math::
     X_n = Proj_{obs} \left( \widetilde X_n \right),
@@ -113,7 +113,7 @@ Two parametric distributions are implemented:
 
 :class:`~qolmat.imputations.diffusions.ddpms.TabDDPM` is a deep learning imputer based on Denoising Diffusion Probabilistic Models (DDPMs) [8] for handling multivariate tabular data. Our implementation mainly follows the works of [8, 9]. Diffusion models focus on modeling the process of data transitions from noisy and incomplete observations to the underlying true data. They include two main processes:
 
-* Forward process perturbs observed data to noise until all the original data structures are lost. The pertubation is done over a series of steps. Let :math:`X_{obs}` be observed data, :math:`T` be the number of steps that noises :math:`\epsilon \sim N(0,I)` are added into the observed data. Therefore, :math:`X_{obs}^t = \bar{\alpha}_t \times X_{obs} + \sqrt{1-\bar{\alpha}_t} \times \epsilon` where :math:`\bar{\alpha}_t` controls the right amount of noise.
+* Forward process perturbs observed data to noise until all the original data structures are lost. The perturbation is done over a series of steps. Let :math:`X_{obs}` be observed data, :math:`T` be the number of steps that noises :math:`\epsilon \sim N(0,I)` are added into the observed data. Therefore, :math:`X_{obs}^t = \bar{\alpha}_t \times X_{obs} + \sqrt{1-\bar{\alpha}_t} \times \epsilon` where :math:`\bar{\alpha}_t` controls the right amount of noise.
 * Reverse process removes noise and reconstructs the observed data. At each step :math:`t`, we train an autoencoder :math:`\epsilon_\theta` based on ResNet [10] to predict the added noise :math:`\epsilon_t` based on the rest of the observed data. The objective function is the error between the noise added in the forward process and the noise predicted by :math:`\epsilon_\theta`.
 
 In training phase, we use the self-supervised learning method of [9] to train incomplete data. In detail, our model randomly masks a part of observed data and computes loss from these masked data. Moving on to the inference phase, (1) missing data are replaced by Gaussian noises :math:`\epsilon \sim N(0,I)`, (2) at each noise step from :math:`T` to 0, our model denoises these missing data based on :math:`\epsilon_\theta`.

diff --git a/examples/benchmark.md b/examples/benchmark.md
@@ -17,7 +17,7 @@ jupyter:
 In Qolmat, a few data imputation methods are implemented as well as a way to evaluate their performance.**
 
 
-First, import some useful librairies
+First, import some useful libraries
 
 ```python tags=[]
 import warnings
@@ -54,7 +54,7 @@ from qolmat.utils import data, utils, plot
 
 
 The dataset `Beijing` is the Beijing Multi-Site Air-Quality Data Set. It consists in hourly air pollutants data from 12 chinese nationally-controlled air-quality monitoring sites and is available at https://archive.ics.uci.edu/ml/machine-learning-databases/00501/.
-This dataset only contains numerical vairables.
+This dataset only contains numerical variables.
 
 ```python tags=[]
 df_data = data.get_data_corrupted("Beijing", ratio_masked=.2, mean_size=120)
@@ -98,11 +98,11 @@ plt.show()
 This part is devoted to the imputation methods. The idea is to try different algorithms and compare them.
 
 <u>**Methods**</u>:
-All presented methods are group-wise: here each station is imputed independently. For example ImputerMean computes the mean of each variable in each station and uses the result for imputation; ImputerInterpolation interpolates termporal signals corresponding to each variable on each station.
+All presented methods are group-wise: here each station is imputed independently. For example ImputerMean computes the mean of each variable in each station and uses the result for imputation; ImputerInterpolation interpolates temporal signals corresponding to each variable on each station.
 
 <u>**Hyperparameters' search**</u>:
 Some methods require hyperparameters. The user can directly specify them, or rather determine them through an optimization step using the `search_params` dictionary. The keys are the imputation method's name and the values are a dictionary specifying the minimum, maximum or list of categories and type of values (Integer, Real, Category or a dictionary indexed by the variable names) to search.
-In pratice, we rely on a cross validation to find the best hyperparams values minimizing an error reconstruction.
+In practice, we rely on a cross validation to find the best hyperparams values minimizing an error reconstruction.
 
 ```python tags=[]
 ratio_masked = 0.1
@@ -476,7 +476,7 @@ plt.show()
 
 
 We first check the covariance. We simply plot one variable versus one another.
-One observes the methods provide similar visual resuls: it's difficult to compare them based on this criterion.
+One observes the methods provide similar visual results: it's difficult to compare them based on this criterion.
 
 ```python
 fig = plt.figure(figsize=(6 * n_imputers, 6 * n_columns))
@@ -494,7 +494,7 @@ plt.show()
 ## Auto-correlation
 
 
-We are now interested in the auto-correlation function (ACF). As seen before, time series display seaonal patterns.
+We are now interested in the auto-correlation function (ACF). As seen before, time series display seasonal patterns.
 [Autocorrelation](https://en.wikipedia.org/wiki/Autocorrelation) is the correlation of a signal with a delayed copy of itself as a function of delay. It measures the similarity between observations of a random variable as a function of the time lag between them. The objective is to have an ACF to be similar between the original dataset and the imputed one.
 
 ```python

diff --git a/examples/tutorials/plot_tuto_benchmark_TS.py b/examples/tutorials/plot_tuto_benchmark_TS.py
@@ -41,7 +41,7 @@
 # For the purpose of this notebook,
 # we corrupt the data, with the ``qolmat.utils.data.add_holes`` function
 # on three variables: "TEMP", "PRES" and "WSPM"
-# and the imputation methods will have acces to two additional features:
+# and the imputation methods will have access to two additional features:
 # "DEWP" and "RAIN".
 
 df_data = data.get_data("Beijing")
@@ -51,7 +51,7 @@
 df = data.add_holes(df_data, ratio_masked=0.15, mean_size=50)
 df[["DEWP", "RAIN"]] = df_data[["DEWP", "RAIN"]]
 # %%
-# Let's take a look a one station, for instance "Aotizhongxin"
+# Let's take a look at one station, for instance "Aotizhongxin"
 
 station = "Aotizhongxin"
 fig, ax = plt.subplots(len(cols_to_impute), 1, figsize=(13, 8))
@@ -68,7 +68,7 @@
 # ---------------------------------------------------------------
 # All presented methods are group-wise: here each station is imputed independently.
 # For example ImputerMean computes the mean of each variable in each station and uses
-# the result for imputation; ImputerInterpolation interpolates termporal
+# the result for imputation; ImputerInterpolation interpolates temporal
 # signals corresponding to each variable on each station.
 # We consider five imputation methods:
 # ``median`` for a baseline imputation;
@@ -181,10 +181,10 @@
 
 # %%
 # We can also check the covariance. We simply plot one variable versus one another.
-# One observes the methods provide similar visual resuls: it's difficult to compare
+# One observes the methods provide similar visual results: it's difficult to compare
 # them based on this criterion, except the median imputation that greatly differs.
-# Black points and ellipses are original datafames
-# whiel colored ones are imputed dataframes.
+# Black points and ellipses are original dataframes
+# while colored ones are imputed dataframes.
 
 n_columns = len(dfs_imputed_station)
 fig = plt.figure(figsize=(10, 10))

diff --git a/examples/tutorials/plot_tuto_categorical.py b/examples/tutorials/plot_tuto_categorical.py
@@ -57,7 +57,7 @@
 # %%
 # The third approach uses ImputerRegressor which imputes iteratively each column using the other
 # ones. The function make_robust_MixteHGB provides an underlying model able to:
-# - adress both numerical targets (regression) and categorical targets (classification)
+# - address both numerical targets (regression) and categorical targets (classification)
 # - manage categorical features though one hot encoding
 # - manage missing features (native to the HistGradientBoosting)
 
@@ -68,7 +68,7 @@
 # %%
 # 3. Mixed type model selection
 # ---------------------------------------------------------------
-# Let us now compare these three aproaches by measuring their ability to impute uniformly
+# Let us now compare these three approaches by measuring their ability to impute uniformly
 # distributed holes.
 
 dict_imputers = {
@@ -101,5 +101,5 @@
 results.loc["rmse"].style.highlight_min(color="lightgreen", axis=1)
 
 # %%
-# The HGB imputation methods globaly reaches a better accuracy on the categorical data.
+# The HGB imputation methods globally reaches a better accuracy on the categorical data.
 results.loc["accuracy"].style.highlight_max(color="lightgreen", axis=1)
diff --git a/examples/tutorials/plot_tuto_diffusion_models.py b/examples/tutorials/plot_tuto_diffusion_models.py
@@ -54,12 +54,12 @@
 #
 # * ``cols_imputed``: list of columns that need to be imputed. Recall that we train the model on
 #   incomplete data by using the self-supervised learning method. We can set which columns to be
-#   masked during training. Its defaut value is ``None``.
+#   masked during training. Its default value is ``None``.
 #
-# * ``epochs`` : a number of iterations, its defaut value ``epochs=10``. In practice, we should
+# * ``epochs`` : a number of iterations, its default value ``epochs=10``. In practice, we should
 #   set a larger number of epochs e.g., ``epochs=100``.
 #
-# * ``batch_size`` : a size of batch, its defaut value ``batch_size=100``.
+# * ``batch_size`` : a size of batch, its default value ``batch_size=100``.
 #
 # The following hyperparams are for validation:
 #
@@ -198,11 +198,11 @@
 #
 # For TsDDPM, we have two options for splitting data:
 #
-# * ``is_rolling=False`` (default value): the data is splited by using
+# * ``is_rolling=False`` (default value): the data is split by using
 #   pandas.DataFrame.resample(rule=freq_str). There is no duplication of row between chunks,
 #   leading a smaller number of chunks than the number of rows in the original data.
 #
-# * ``is_rolling=True``: the data is splited by using pandas.DataFrame.rolling(window=freq_str).
+# * ``is_rolling=True``: the data is split by using pandas.DataFrame.rolling(window=freq_str).
 #   The number of chunks is also the number of rows in the original data.
 #   Note that setting ``is_rolling=True`` always produces better quality of imputations
 #   but requires a longer training/inference time.