Skip to content

Commit f5e389d

Browse files
authored
New DiffEntropyEst interface (#239)
* Change definition of `entropy(diff_ent_est, ...)` * move Kraskov to new interface * port kozacghenko to new interafce * port Zhu to new intercace * move ZhuSign to new interface * move Lord to new interface * move Gao and Goria to new interface * move all order statistics estiamtors to new intercface * [WIP] fixing tests * add internal function for check with given entropy def * port all estimator tests to new interface * check if base is same in compatibility * remove unecessary test * discuss difference of estimator and definition * add discrete entropy estimator type * add it to normalizd/maximum as well * completel;y remoe possibility of entropy(def, differential_estimator) * rename to MLEntropy * docstring of entropy should use estimator * add discrete entropy estimator to docs * remove testing of old interface * correct Alizadeh error * resolve all outstanding issues * fix broken normalized entropy code * done * up verson
1 parent dbe947e commit f5e389d

38 files changed

+317
-321
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,11 @@
22

33
Changelog is kept with respect to version 0.11 of Entropies.jl. From version v2.0 onwards, this package has been renamed to ComplexityMeasures.jl.
44

5+
## 2.3
6+
- Like differential entropies, discrete entropies now also have their own estimator type.
7+
- The approach of giving both an entropy definition, and an entropy estimator to `entropy` has been dropped. Now the entropy estimators know what definitions they are applied for. This change is a deprecation, i.e., backwards compatible.
8+
- Added `MLEntropy` discrete entropy estimator.
9+
510
## 2.2
611

712
- Corrected documentation for `SymbolicPermutation`, `SymbolicAmplitudeAwarePermutation`,

Project.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ name = "ComplexityMeasures"
22
uuid = "ab4b797d-85ee-42ba-b621-05d793b346a2"
33
authors = "Kristian Agasøster Haaga <kahaaga@gmail.com>, George Datseries <datseris.george@gmail.com>"
44
repo = "https://github.com/juliadynamics/ComplexityMeasures.jl.git"
5-
version = "2.2.0"
5+
version = "2.3.0"
66

77
[deps]
88
Combinatorics = "861a8166-3701-5b0c-9a16-15d98fcdc6aa"

docs/src/entropies.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,10 @@ The entropies API is defined by
66

77
- [`EntropyDefinition`](@ref)
88
- [`entropy`](@ref)
9+
- [`DiscreteEntropyEstimator`](@ref)
910
- [`DifferentialEntropyEstimator`](@ref)
1011

11-
Please be sure you have read the [Terminology](@ref) section before going through the API here, to have a good idea of the different "flavors" of entropies and how they all come together over the common interface of the [`entropy`](@ref) function.
12+
Please be sure you have read the [Terminology](@ref terminology) section before going through the API here, to have a good idea of the different "flavors" of entropies and how they all come together over the common interface of the [`entropy`](@ref) function.
1213

1314
## Entropy definitions
1415

@@ -30,13 +31,20 @@ entropy_maximum
3031
entropy_normalized
3132
```
3233

34+
### Discrete entropy estimators
35+
36+
```@docs
37+
DiscreteEntropyEstimator
38+
MLEntropy
39+
```
40+
3341
## Differential entropy
3442

3543
```@docs
36-
entropy(::EntropyDefinition, ::DifferentialEntropyEstimator, ::Any)
44+
entropy(::DifferentialEntropyEstimator, ::Any)
3745
```
3846

39-
### Table of differential entropy estimators
47+
### [Table of differential entropy estimators](@id table_diff_ent_est)
4048

4149
The following estimators are *differential* entropy estimators, and can also be used
4250
with [`entropy`](@ref).

docs/src/examples.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ end
9595
# Plot results
9696
# -------------
9797
fig = Figure(resolution = (700, 11 * 200))
98-
labels_knn = ["KozachenkoLeonenko", "Kraskov", "Zhu", "ZhuSingh", "Gao (not corrected)",
98+
labels_knn = ["KozachenkoLeonenko", "Kraskov", "Zhu", "ZhuSingh", "Gao (not corrected)",
9999
"Gao (corrected)", "Goria", "Lord"]
100100
labels_os = ["Vasicek", "Ebrahimi", "AlizadehArghami", "Correa"]
101101
@@ -354,7 +354,7 @@ When comparing different signals or signals that have different length, it is be
354354

355355
```@example MAIN
356356
using DynamicalSystemsBase
357-
N1, N2, a = 101, 100001, 10
357+
N1, N2, a = 101, 10001, 10
358358
359359
for N in (N1, N2)
360360
local t = LinRange(0, 2*a*π, N)

docs/src/index.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
ComplexityMeasures
55
```
66

7-
## Content and terminology
7+
## [Content and terminology](@id terminology)
88

99
!!! note
1010
The documentation here follows (loosely) chapter 5 of
@@ -18,9 +18,8 @@ from input data.
1818

1919
### Probabilities
2020

21-
Entropies and other complexity measures are typically computed based on *probability
22-
distributions* (or more precisely
23-
[*probability mass functions*](https://en.wikipedia.org/wiki/Probability_mass_function)),
21+
Entropies and other complexity measures are typically computed based on _probability
22+
distributions_,
2423
which we simply refer to as "probabilities".
2524
Probabilities can be obtained from input data in a plethora of different ways.
2625
The central API function that returns a probability distribution
@@ -36,14 +35,16 @@ even fundamentally, different quantities.
3635
In ComplexityMeasures.jl, we provide the generic
3736
function [`entropy`](@ref) that tries to both clarify disparate entropy concepts, while
3837
unifying them under a common interface that highlights the modular nature of the word
39-
"entropy". In summary, there are only two main types of entropy.
38+
"entropy".
4039

41-
- *Discrete* entropies are functions of probabilities (specifically, probability mass functions). Computing a discrete entropy boils
40+
On the highest level, there are two main types of entropy.
41+
42+
- *Discrete* entropies are functions of [probability mass functions](https://en.wikipedia.org/wiki/Probability_mass_function). Computing a discrete entropy boils
4243
down to two simple steps: first estimating a probability distribution, then plugging
43-
the estimated probabilities into one of the so-called "generalized entropy" definitions.
44+
the estimated probabilities into an estimator of a so-called "generalized entropy" definition.
4445
Internally, this is literally just a few lines of code where we first apply some
4546
[`ProbabilitiesEstimator`](@ref) to the input data, and feed the resulting
46-
[`probabilities`](@ref) to [`entropy`](@ref) with some [`EntropyDefinition`](@ref).
47+
[`probabilities`](@ref) to [`entropy`](@ref) with some [`DiscreteEntropyEstimator`](@ref).
4748
- *Differential/continuous* entropies are functions of
4849
[probability density functions](https://en.wikipedia.org/wiki/Probability_density_function),
4950
which are *integrals*. Computing differential entropies therefore rely on estimating
@@ -59,15 +60,15 @@ They are the good old discrete Shannon entropy ([`Shannon`](@ref)), but calculat
5960
*new probabilities estimators*.
6061

6162
Even though the names of these methods (e.g. "wavelet entropy") sound like names for new
62-
entropies, they are *method* names. What these methods actually do is to devise novel
63+
entropies, what they actually do is to devise novel
6364
ways of calculating probabilities from data, and then plug those probabilities into formal
6465
discrete entropy formulas such as
6566
the Shannon entropy. These probabilities estimators are of course smartly created so that
6667
they elegantly highlight important complexity-related aspects of the data.
6768

6869
Names for methods such as "permutation entropy" are commonplace, so in
6970
ComplexityMeasures.jl we provide convenience functions like [`entropy_permutation`](@ref).
70-
However, we emphasise that these functions really aren't anything more than
71+
However, we emphasize that these functions really aren't anything more than
7172
2-lines-of-code wrappers that call [`entropy`](@ref) with the appropriate
7273
[`ProbabilitiesEstimator`](@ref).
7374

src/deprecations.jl

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,14 @@
1-
@deprecate TimeScaleMODWT WaveletOverlap
1+
# from before https://github.com/JuliaDynamics/ComplexityMeasures.jl/pull/239
2+
function entropy(e::EntropyDefinition, est::DiffEntropyEst, x)
3+
if e isa Shannon
4+
return entropy(est, x)
5+
else
6+
error("only shannon entropy supports this deprecated interface")
7+
end
8+
end
29

10+
# From before 2.0:
11+
@deprecate TimeScaleMODWT WaveletOverlap
312
function probabilities(x::Vector_or_Dataset, ε::Union{Real, Vector{<:Real}})
413
@warn """
514
`probabilities(x::Vector_or_Dataset, ε::Real)`

src/encoding_implementations/rectangular_binning.jl

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,16 +68,17 @@ end
6868
RectangularBinEncoding(binning::RectangularBinning, x; n_eps = 2)
6969
RectangularBinEncoding(binning::FixedRectangularBinning; n_eps = 2)
7070
71-
Struct used in [`outcomes`](@ref) to map points of `x` into their respective bins.
71+
An encoding scheme that [`encode`](@ref)s points `χ ∈ x` into their histogram bins.
7272
It finds the minima along each dimension, and computes appropriate
7373
edge lengths for each dimension of `x` given a rectangular binning.
7474
7575
The second signature does not need `x` because (1) the binning is fixed, and the
7676
size of `x` doesn't matter, and (2) because the binning contains the dimensionality
7777
information as `ϵmin/max` is already an `NTuple`.
7878
79-
Due to roundoff error when computing bin edges, a small tolerance `n_eps * eps()`
80-
is added to bin widths to ensure the correct number of bins is produced.
79+
Due to roundoff error when computing bin edges, the computed bin widths
80+
are increased to their `nextfloat` `n_eps` times
81+
to ensure the correct number of bins is produced.
8182
8283
See also: [`RectangularBinning`](@ref), [`FixedRectangularBinning`](@ref).
8384
"""

src/entropies_definitions/kaniadakis.jl

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,13 @@ The Kaniadakis entropy (Tsallis, 2009)[^Tsallis2009], used with [`entropy`](@ref
88
compute
99
1010
```math
11-
H_K(p) = -\\sum_{i=1}^N p_i\\log_\\kappa^K(p_i),
11+
H_K(p) = -\\sum_{i=1}^N p_i f_\\kappa(p_i),
1212
```
1313
```math
14-
\\log_\\kappa = \\dfrac{x^\\kappa - x^{-\\kappa}}{2\\kappa},
14+
f_\\kappa (x) = \\dfrac{x^\\kappa - x^{-\\kappa}}{2\\kappa},
1515
```
16-
where if ``\\kappa = 0``, regular logarithm to the given `base` is used, and `log(0) = 0`.
16+
where if ``\\kappa = 0``, regular logarithm to the given `base` is used, and
17+
0 probabilities are skipped.
1718
1819
[^Tsallis2009]:
1920
Tsallis, C. (2009). Introduction to nonextensive statistical mechanics: approaching a

src/entropies_definitions/shannon.jl

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,9 @@ H(p) = - \\sum_i p[i] \\log(p[i])
1111
```
1212
with the ``\\log`` at the given `base`.
1313
14+
The maximum value of the Shannon entropy is ``\\log_{base}(L)``, which is the entropy of the
15+
uniform distribution with ``L`` the [`total_outcomes`](@ref).
16+
1417
[^Shannon1948]: C. E. Shannon, Bell Systems Technical Journal **27**, pp 379 (1948)
1518
"""
1619
Base.@kwdef struct Shannon{B} <: EntropyDefinition
@@ -19,7 +22,7 @@ end
1922

2023
function entropy(e::Shannon, probs::Probabilities)
2124
base = e.base
22-
non0_probs = Iterators.filter(!iszero, probs.p)
25+
non0_probs = Iterators.filter(!iszero, vec(probs))
2326
logf = log_with_base(base)
2427
return -sum(x*logf(x) for x in non0_probs)
2528
end

src/entropies_definitions/tsallis.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@ with `k` standing for the Boltzmann constant. It is defined as
2020
S_q(p) = \\frac{k}{q - 1}\\left(1 - \\sum_{i} p[i]^q\\right)
2121
```
2222
23-
If the probability estimator has known alphabet length ``L``, then the maximum
24-
value of the Tsallis entropy is ``k(L^{1 - q} - 1)/(1 - q)``.
23+
The maximum value of the Tsallis entropy is ````k(L^{1 - q} - 1)/(1 - q)``,
24+
with ``L`` the [`total_outcomes`](@ref).
2525
2626
[^Tsallis1988]:
2727
Tsallis, C. (1988). Possible generalization of Boltzmann-Gibbs statistics.

0 commit comments

Comments
 (0)