Skip to content

Commit dbe947e

Browse files
kahaagaDatseris
andauthored
Ordinal patterns: better documentation and add tests again (#243)
* Add tests again. This is part of the public API now. * Update docstring * Correct imports * Fix tests * Comment binning-stuff. Should we introduce this later? * Fix tests * Update changelog and version * Update docs * Update tests * Remove complicated use case * Actually update docstring * Small mention of `permutation_to_integer` * even more improvement of docsring Co-authored-by: George Datseris <datseris.george@gmail.com>
1 parent 93362bf commit dbe947e

File tree

7 files changed

+228
-221
lines changed

7 files changed

+228
-221
lines changed

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,13 @@
22

33
Changelog is kept with respect to version 0.11 of Entropies.jl. From version v2.0 onwards, this package has been renamed to ComplexityMeasures.jl.
44

5+
## 2.2
6+
7+
- Corrected documentation for `SymbolicPermutation`, `SymbolicAmplitudeAwarePermutation`,
8+
and `SymbolicWeightedPermutation`, indicating that the outcome space is the set of
9+
`factorial(m)` *permutations* of the integers `1:m`, not the rank orderings,
10+
as was stated before.
11+
512
## 2.1
613

714
- Added `Gao` estimator for differential Shannon entropy.

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ name = "ComplexityMeasures"
22
uuid = "ab4b797d-85ee-42ba-b621-05d793b346a2"
33
authors = "Kristian Agasøster Haaga <kahaaga@gmail.com>, George Datseries <datseris.george@gmail.com>"
44
repo = "https://github.com/juliadynamics/ComplexityMeasures.jl.git"
5-
version = "2.1.0"
5+
version = "2.2.0"
66

77
[deps]
88
Combinatorics = "861a8166-3701-5b0c-9a16-15d98fcdc6aa"

src/encoding_implementations/ordinal_pattern.jl

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ their permutation/ordinal patterns and then into the integers based on the Lehme
1212
code. It is used by [`SymbolicPermutation`](@ref) and similar estimators, see that for
1313
a description of the outcome space.
1414
15+
The ordinal/permutation pattern of a vector `χ` is simply `sortperm(χ)`, which gives the
16+
indices that would sort `χ` in ascending order.
17+
1518
## Description
1619
1720
The Lehmer code, as implemented here, is a bijection between the set of `factorial(m)`
@@ -25,20 +28,23 @@ The decoding step is much slower due to missing optimizations (pull requests wel
2528
```jldoctest
2629
julia> using ComplexityMeasures
2730
28-
julia> x = [4.0, 1.0, 9.0];
31+
julia> χ = [4.0, 1.0, 9.0];
2932
3033
julia> c = OrdinalPatternEncoding(3);
3134
32-
julia> encode(c, x)
35+
julia> i = encode(c, χ)
3336
3
3437
35-
julia> decode(c, 1)
38+
julia> decode(c, i)
3639
3-element SVector{3, Int64} with indices SOneTo(3):
3740
2
3841
1
3942
3
4043
```
4144
45+
If you want to encode something that is already a permutation pattern, then you
46+
can use the non-exported `permutation_to_integer` function.
47+
4248
[^Berger2019]:
4349
Berger et al. "Teaching Ordinal Patterns to a Computer: Efficient
4450
Encoding Algorithms Based on the Lehmer Code." Entropy 21.10 (2019): 1023.
@@ -58,15 +64,20 @@ end
5864
total_outcomes(::OrdinalPatternEncoding{m}) where {m} = factorial(m)
5965
outcome_space(::OrdinalPatternEncoding{m}) where {m} = permutations(1:m) |> collect
6066

61-
6267
# Notice that `χ` is an element of a `Dataset`, so most definitely a static vector in
63-
# our code. However we allow `AbstractVector` if a user wanna use `encode` directly
68+
# our code. However we allow `AbstractVector` if a user wanna use `encode` directly.
6469
function encode(encoding::OrdinalPatternEncoding{m}, χ::AbstractVector) where {m}
6570
if m != length(χ)
6671
throw(ArgumentError("Permutation order and length of input must match!"))
6772
end
68-
perm = sortperm!(encoding.perm, χ; lt = encoding.lt)
69-
# Begin Lehmer code
73+
perm = sortperm!(encoding.perm, χ)
74+
return permutation_to_integer(perm)
75+
end
76+
77+
# The algorithm from Berger (2019). Use this directly if encoding *permutations* instead
78+
# of input vectors that are to be permuted.
79+
function permutation_to_integer(perm)
80+
m = length(perm)
7081
n = 0
7182
for i = 1:m-1
7283
for j = i+1:m
@@ -85,7 +96,7 @@ end
8596
function decode(::OrdinalPatternEncoding{m}, s::Int) where {m}
8697
# Convert integer to its factorial number representation. Each factorial number
8798
# corresponds to a unique permutation of the numbers `1, 2, ..., m`.
88-
f::SVector{m, Int} = base10_to_factorial(s - 1, m) # subtract 1 because we add 1 in `encode`
99+
f::SVector{m, Int} = base10_to_factorial(s - 1, m) # subtract 1 because we add 1 above
89100

90101
# Reconstruct the permutation from the factorial representation
91102
xs = 1:m |> collect

src/probabilities_estimators/symbolic_permutation.jl

Lines changed: 27 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -34,58 +34,59 @@ When passed to [`probabilities`](@ref) the output depends on the input data type
3434
by using [`CountOccurrences`](@ref). When giving the resulting probabilities to
3535
[`entropy`](@ref), the original permutation entropy is computed [^BandtPompe2002].
3636
- **Multivariate data**. If applied to a an `D`-dimensional `Dataset`,
37-
then no embedding is constructed, and we each vector ``\\bf{x}_i`` of the dataset
38-
directly to its permutation pattern ``\\pi_{i}``, ``\\pi_{i}`` by comparing the
37+
then no embedding is constructed, `m` must be equal to `D` and `τ` is ignored.
38+
Each vector ``\\bf{x}_i`` of the dataset is mapped
39+
directly to its permutation pattern ``\\pi_{i}`` by comparing the
3940
relative magnitudes of the elements of ``\\bf{x}_i``.
4041
Like above, probabilities are estimated as the frequencies of the permutation symbols.
41-
In this case, `m` is ignored,
42-
but `m` must still match the dimension of the dataset for optimization.
4342
The resulting probabilities can be used to compute multivariate permutation
4443
entropy[^He2016], although here we don't perform any further subdivision
4544
of the permutation patterns (as in Figure 3 of[^He2016]).
4645
4746
Internally, [`SymbolicPermutation`](@ref) uses the [`OrdinalPatternEncoding`](@ref)
4847
to represent ordinal patterns as integers for efficient computations.
4948
49+
See [`SymbolicWeightedPermutation`](@ref) and [`SymbolicAmplitudeAwarePermutation`](@ref)
50+
for estimators that not only consider ordinal (sorting) patterns, but also incorporate
51+
information about within-state-vector amplitudes.
52+
For a version of this estimator that can be used on spatial data, see
53+
[`SpatialSymbolicPermutation`](@ref).
54+
55+
!!! note "Handling equal values in ordinal patterns"
56+
In Bandt & Pompe (2002), equal values are ordered after their order of appearance, but
57+
this can lead to erroneous temporal correlations, especially for data with
58+
low amplitude resolution [^Zunino2017]. Here, by default, if two values are equal,
59+
then one of the is randomly assigned as "the largest", using
60+
`lt = ComplexityMeasures.isless_rand`.
61+
To get the behaviour from Bandt and Pompe (2002), use `lt = Base.isless`.
62+
5063
## Outcome space
5164
5265
The outcome space `Ω` for `SymbolicPermutation` is the set of length-`m` ordinal
53-
patterns (i.e. permutations) that can be formed by the integers `1, 2, …, m`,
54-
ordered lexicographically. There are `factorial(m)` such patterns.
66+
patterns (i.e. permutations) that can be formed by the integers `1, 2, …, m`.
67+
There are `factorial(m)` such patterns.
5568
56-
For example, the outcome `[3, 1, 2]` corresponds to the ordinal pattern of having
57-
first the largest value, then the lowest value, and then the value in between.
69+
For example, the outcome `[2, 3, 1]` corresponds to the ordinal pattern of having
70+
the smallest value in the second position, the next smallest value in the third
71+
position, and the next smallest, i.e. the largest value in the first position.
72+
See also [`OrdinalPatternEncoding`(@ref).
5873
5974
## In-place symbolization
6075
6176
`SymbolicPermutation` also implements the in-place [`probabilities!`](@ref)
62-
for `Dataset` input (or embedded vector input).
63-
The length of the pre-allocated symbol vector must match the length of the dataset.
77+
for `Dataset` input (or embedded vector input) for reducing allocations in looping scenarios.
78+
The length of the pre-allocated symbol vector must be the length of the dataset.
6479
For example
6580
6681
```julia
67-
using DelayEmbeddings, ComplexityMeasures
82+
using ComplexityMeasures
6883
m, N = 2, 100
6984
est = SymbolicPermutation(; m, τ)
70-
x = Dataset(rand(N, m) # timeseries example
85+
x = Dataset(rand(N, m)) # some input dataset
7186
πs_ts = zeros(Int, N) # length must match length of `x`
7287
p = probabilities!(πs_ts, est, x)
7388
```
7489
75-
See [`SymbolicWeightedPermutation`](@ref) and [`SymbolicAmplitudeAwarePermutation`](@ref)
76-
for estimators that not only consider ordinal (sorting) patterns, but also incorporate
77-
information about within-state-vector amplitudes.
78-
For a version of this estimator that can be used on high-dimensional arrays, see
79-
[`SpatialSymbolicPermutation`](@ref).
80-
81-
!!! note "Handling equal values in ordinal patterns"
82-
In Bandt & Pompe (2002), equal values are ordered after their order of appearance, but
83-
this can lead to erroneous temporal correlations, especially for data with
84-
low amplitude resolution [^Zunino2017]. Here, by default, if two values are equal,
85-
then one of the is randomly assigned as "the largest", using
86-
`lt = ComplexityMeasures.isless_rand`. To get the behaviour from Bandt and Pompe (2002), use
87-
`lt = Base.isless`).
88-
8990
[^BandtPompe2002]: Bandt, Christoph, and Bernd Pompe. "Permutation entropy: a natural
9091
complexity measure for timeseries." Physical review letters 88.17 (2002): 174102.
9192
[^Zunino2017]: Zunino, L., Olivares, F., Scholkmann, F., & Rosso, O. A. (2017).

test/runtests.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ testfile(file, testname=defaultname(file)) = @testset "$testname" begin; include
1212

1313
# Various
1414
testfile("utils/fasthist.jl")
15-
# testfile("utils/encoding.jl")
15+
testfile("utils/encoding.jl")
1616
testfile("convenience.jl")
1717
testfile("deprecations.jl")
1818
end

0 commit comments

Comments
 (0)