Ordinal patterns: better documentation and add tests again (#243)

kahaaga · Datseris · web-flow · commit dbe947ec6e1d · 2023-01-09T21:50:18.000+02:00
* Add tests again. This is part of the public API now.

* Update docstring

* Correct imports

* Fix tests

* Comment binning-stuff. Should we introduce this later?

* Fix tests

* Update changelog and version

* Update docs

* Update tests

* Remove complicated use case

* Actually update docstring

* Small mention of `permutation_to_integer`

* even more improvement of docsring

Co-authored-by: George Datseris &lt;datseris.george@gmail.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,13 @@
 
 Changelog is kept with respect to version 0.11 of Entropies.jl. From version v2.0 onwards, this package has been renamed to ComplexityMeasures.jl.
 
+## 2.2
+
+- Corrected documentation for `SymbolicPermutation`, `SymbolicAmplitudeAwarePermutation`,
+    and `SymbolicWeightedPermutation`, indicating that the outcome space is the set of
+    `factorial(m)` *permutations* of the integers `1:m`, not the rank orderings,
+    as was stated before.
+
 ## 2.1
 
 - Added `Gao` estimator for differential Shannon entropy.
diff --git a/Project.toml b/Project.toml
@@ -2,7 +2,7 @@ name = "ComplexityMeasures"
 uuid = "ab4b797d-85ee-42ba-b621-05d793b346a2"
 authors = "Kristian Agasøster Haaga <kahaaga@gmail.com>, George Datseries <datseris.george@gmail.com>"
 repo = "https://github.com/juliadynamics/ComplexityMeasures.jl.git"
-version = "2.1.0"
+version = "2.2.0"
 
 [deps]
 Combinatorics = "861a8166-3701-5b0c-9a16-15d98fcdc6aa"
diff --git a/src/encoding_implementations/ordinal_pattern.jl b/src/encoding_implementations/ordinal_pattern.jl
@@ -12,6 +12,9 @@ their permutation/ordinal patterns and then into the integers based on the Lehme
 code. It is used by [`SymbolicPermutation`](@ref) and similar estimators, see that for
 a description of the outcome space.
 
+The ordinal/permutation pattern of a vector `χ` is simply `sortperm(χ)`, which gives the
+indices that would sort `χ` in ascending order.
+
 ## Description
 
 The Lehmer code, as implemented here, is a bijection between the set of `factorial(m)`
@@ -25,20 +28,23 @@ The decoding step is much slower due to missing optimizations (pull requests wel
 ```jldoctest
 julia> using ComplexityMeasures
 
-julia> x = [4.0, 1.0, 9.0];
+julia> χ = [4.0, 1.0, 9.0];
 
 julia> c = OrdinalPatternEncoding(3);
 
-julia> encode(c, x)
+julia> i = encode(c, χ)
 3
 
-julia> decode(c, 1)
+julia> decode(c, i)
 3-element SVector{3, Int64} with indices SOneTo(3):
  2
  1
  3
 ```
 
+If you want to encode something that is already a permutation pattern, then you
+can use the non-exported `permutation_to_integer` function.
+
 [^Berger2019]:
     Berger et al. "Teaching Ordinal Patterns to a Computer: Efficient
     Encoding Algorithms Based on the Lehmer Code." Entropy 21.10 (2019): 1023.
@@ -58,15 +64,20 @@ end
 total_outcomes(::OrdinalPatternEncoding{m}) where {m} = factorial(m)
 outcome_space(::OrdinalPatternEncoding{m}) where {m} = permutations(1:m) |> collect
 
-
 # Notice that `χ` is an element of a `Dataset`, so most definitely a static vector in
-# our code. However we allow `AbstractVector` if a user wanna use `encode` directly
+# our code. However we allow `AbstractVector` if a user wanna use `encode` directly.
 function encode(encoding::OrdinalPatternEncoding{m}, χ::AbstractVector) where {m}
     if m != length(χ)
         throw(ArgumentError("Permutation order and length of input must match!"))
     end
-    perm = sortperm!(encoding.perm, χ; lt = encoding.lt)
-    # Begin Lehmer code
+    perm = sortperm!(encoding.perm, χ)
+    return permutation_to_integer(perm)
+end
+
+# The algorithm from Berger (2019). Use this directly if encoding *permutations* instead
+# of input vectors that are to be permuted.
+function permutation_to_integer(perm)
+    m = length(perm)
     n = 0
     for i = 1:m-1
         for j = i+1:m
@@ -85,7 +96,7 @@ end
 function decode(::OrdinalPatternEncoding{m}, s::Int) where {m}
     # Convert integer to its factorial number representation. Each factorial number
     # corresponds to a unique permutation of the numbers `1, 2, ..., m`.
-    f::SVector{m, Int} = base10_to_factorial(s - 1, m) # subtract 1 because we add 1 in `encode`
+    f::SVector{m, Int} = base10_to_factorial(s - 1, m) # subtract 1 because we add 1 above
 
     # Reconstruct the permutation from the factorial representation
     xs = 1:m |> collect
diff --git a/src/probabilities_estimators/symbolic_permutation.jl b/src/probabilities_estimators/symbolic_permutation.jl
@@ -34,58 +34,59 @@ When passed to [`probabilities`](@ref) the output depends on the input data type
     by using [`CountOccurrences`](@ref). When giving the resulting probabilities to
     [`entropy`](@ref), the original permutation entropy is computed [^BandtPompe2002].
 - **Multivariate data**. If applied to a an `D`-dimensional `Dataset`,
-    then no embedding is constructed, and we each vector ``\\bf{x}_i`` of the dataset
-    directly to its permutation pattern ``\\pi_{i}``, ``\\pi_{i}`` by comparing the
+    then no embedding is constructed, `m` must be equal to `D` and `τ` is ignored.
+    Each vector ``\\bf{x}_i`` of the dataset is mapped
+    directly to its permutation pattern ``\\pi_{i}`` by comparing the
     relative magnitudes of the elements of ``\\bf{x}_i``.
     Like above, probabilities are estimated as the frequencies of the permutation symbols.
-    In this case, `m` is ignored,
-    but `m` must still match the dimension of the dataset for optimization.
     The resulting probabilities can be used to compute multivariate permutation
     entropy[^He2016], although here we don't perform any further subdivision
     of the permutation patterns (as in Figure 3 of[^He2016]).
 
 Internally, [`SymbolicPermutation`](@ref) uses the [`OrdinalPatternEncoding`](@ref)
 to represent ordinal patterns as integers for efficient computations.
 
+See [`SymbolicWeightedPermutation`](@ref) and [`SymbolicAmplitudeAwarePermutation`](@ref)
+for estimators that not only consider ordinal (sorting) patterns, but also incorporate
+information about within-state-vector amplitudes.
+For a version of this estimator that can be used on spatial data, see
+[`SpatialSymbolicPermutation`](@ref).
+
+!!! note "Handling equal values in ordinal patterns"
+    In Bandt & Pompe (2002), equal values are ordered after their order of appearance, but
+    this can lead to erroneous temporal correlations, especially for data with
+    low amplitude resolution [^Zunino2017]. Here, by default, if two values are equal,
+    then one of the is randomly assigned as "the largest", using
+    `lt = ComplexityMeasures.isless_rand`.
+    To get the behaviour from Bandt and Pompe (2002), use `lt = Base.isless`.
+
 ## Outcome space
 
 The outcome space `Ω` for `SymbolicPermutation` is the set of length-`m` ordinal
-patterns (i.e. permutations) that can be formed by the integers `1, 2, …, m`,
-ordered lexicographically. There are `factorial(m)` such patterns.
+patterns (i.e. permutations) that can be formed by the integers `1, 2, …, m`.
+There are `factorial(m)` such patterns.
 
-For example, the outcome `[3, 1, 2]` corresponds to the ordinal pattern of having
-first the largest value, then the lowest value, and then the value in between.
+For example, the outcome `[2, 3, 1]` corresponds to the ordinal pattern of having
+the smallest value in the second position, the next smallest value in the third
+position, and the next smallest, i.e. the largest value in the first position.
+See also [`OrdinalPatternEncoding`(@ref).
 
 ## In-place symbolization
 
 `SymbolicPermutation` also implements the in-place [`probabilities!`](@ref)
-for `Dataset` input (or embedded vector input).
-The length of the pre-allocated symbol vector must match the length of the dataset.
+for `Dataset` input (or embedded vector input) for reducing allocations in looping scenarios.
+The length of the pre-allocated symbol vector must be the length of the dataset.
 For example
 
 ```julia
-using DelayEmbeddings, ComplexityMeasures
+using ComplexityMeasures
 m, N = 2, 100
 est = SymbolicPermutation(; m, τ)
-x = Dataset(rand(N, m) # timeseries example
+x = Dataset(rand(N, m)) # some input dataset
 πs_ts = zeros(Int, N) # length must match length of `x`
 p = probabilities!(πs_ts, est, x)
 ```
 
-See [`SymbolicWeightedPermutation`](@ref) and [`SymbolicAmplitudeAwarePermutation`](@ref)
-for estimators that not only consider ordinal (sorting) patterns, but also incorporate
-information about within-state-vector amplitudes.
-For a version of this estimator that can be used on high-dimensional arrays, see
-[`SpatialSymbolicPermutation`](@ref).
-
-!!! note "Handling equal values in ordinal patterns"
-    In Bandt & Pompe (2002), equal values are ordered after their order of appearance, but
-    this can lead to erroneous temporal correlations, especially for data with
-    low amplitude resolution [^Zunino2017]. Here, by default, if two values are equal,
-    then one of the is randomly assigned as "the largest", using
-    `lt = ComplexityMeasures.isless_rand`. To get the behaviour from Bandt and Pompe (2002), use
-    `lt = Base.isless`).
-
 [^BandtPompe2002]: Bandt, Christoph, and Bernd Pompe. "Permutation entropy: a natural
     complexity measure for timeseries." Physical review letters 88.17 (2002): 174102.
 [^Zunino2017]: Zunino, L., Olivares, F., Scholkmann, F., & Rosso, O. A. (2017).
diff --git a/test/runtests.jl b/test/runtests.jl
@@ -12,7 +12,7 @@ testfile(file, testname=defaultname(file)) = @testset "$testname" begin; include
 
     # Various
     testfile("utils/fasthist.jl")
-    # testfile("utils/encoding.jl")
+    testfile("utils/encoding.jl")
     testfile("convenience.jl")
     testfile("deprecations.jl")
 end
diff --git a/test/utils/encoding.jl b/test/utils/encoding.jl
diff --git a/test/utils/utils.jl b/test/utils/utils.jl