You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In recent years, an ever growing interest in **Topological Data Analysis** (TDA) emerged in the field of data science. The core idea of TDA is to gain insights from data by using topological methods that are proved to be reliable with respect to noise, and that behave nicely with respect to dimension. This Python package provides an implementation of the **Mapper Algorithm**, a well-known tool from TDA.
6
8
7
9
The Mapper Algorithm takes any dataset $X$ and returns a *shape-summary* in the form a graph $G$, called **Mapper Graph**. It's possible to prove, under reasonable conditions, that $X$ and $G$ share the same number of connected components.
8
10
9
-
## Basics
11
+
For an in-depth description of Mapper please read [the original paper](https://research.math.osu.edu/tgda/mapperPBG.pdf).
12
+
13
+
* Installation from package: TBD
14
+
* Installation from sources: clone this repo and run ```python -m pip install .```
 you can find a worked out example that shows how to use this package. We perform some analysis on the the well known dataset of , consisting of less than 2000 8x8 pictures represented as arrays of 64 elements.
21
+
22
+
```python
23
+
import numpy as np
24
+
25
+
from sklearn.datasets import load_digits
26
+
from sklearn.cluster import AgglomerativeClustering
27
+
from sklearn.decomposition importPCA
28
+
29
+
from tdamapper.core import MapperAlgorithm
30
+
from tdamapper.cover import CubicalCover
31
+
from tdamapper.clustering import PermissiveClustering
Let $f$ be any chosen *lens*, i.e. a continuous map $f \colon X \to Y$, being $Y$ any parameter space (*typically* low dimensional). In order to build the Mapper Graph follow these steps:
12
-
13
-
1. Build an *open cover* for $f(X)$, i.e. a collection of *open sets* whose union makes the whole image $f(X)$.
14
-
15
-
2. Run clustering on the preimage of each open set. All these local clusters together make a *refined open cover* for $X$.
16
-
17
-
3. Build the mapper graph $G$ by taking a node for each local cluster, and by drawing an edge between two nodes whenever their corresponding local clusters intersect.
62
+
```
18
63
19
-
To get an idea, in the following picture we have $X$ as an X-shaped point cloud in $\mathbb{R}^2$, with $f$ being the *height function*, i.e. the projection on the $y$-axis. In the leftmost part we cover the projection of $X$ with three open sets. Every open set is represented with a different color. Then we take the preimage of these sets, cluster then, and finally build the graph according to intersections.
64
+

20
65
21
-

66
+
It's also possible to obtain a new plot colored according to different values, while keeping the same computed geometry. For example, if we want to visualize how much dispersion we have on each cluster, we could plot colors according to the standard deviation
22
67
23
-
The choice of the lens is the most relevant on the shape of the Mapper Graph. Some common choices are *statistics*, *projections*, *entropy*, *density*, *eccentricity*, and so forth. However, in order to pick a good lens, specific domain knowledge for the data at hand can give a hint. For an in-depth description of Mapper please read [the original paper](https://research.math.osu.edu/tgda/mapperPBG.pdf).
 you can find a worked out example that shows how to use this package.
43
-
We perform some analysis on the the well known dataset of , consisting of less than 2000 8x8 pictures represented as arrays of 64 elements.
44
-
45
-

46
-
47
-
It's also possible to obtain a new plot colored according to different values, while keeping the same computed geometry. For example, if we want to visualize how much dispersion we have on each cluster, we could plot colors according to the standard deviation
48
-
49
-

82
+

50
83
51
84
The mapper graph of the digits dataset shows a few interesting patterns. For example, we can make the following observations:
52
85
@@ -55,25 +88,3 @@ The mapper graph of the digits dataset shows a few interesting patterns. For exa
55
88
* Some clusters are not well separated and tend to overlap one on the other. This mixed behavior is present in those digits which can be easily confused one with the other, for example digits 5 and 6.
56
89
57
90
* Clusters located across the "boundary" of two different digits show a transition either due to a change in distribution or due to distorsions in the hand written text, for example digits 8 and 2.
58
-
59
-
60
-
### Development - Supported Features
61
-
62
-
-[x] Topology
63
-
-[x] custom lenses
64
-
-[x] custom metrics
65
-
66
-
-[x] Cover algorithms:
67
-
-[x]`CubicalCover`
68
-
-[x]`BallCover`
69
-
-[x]`KnnCover`
70
-
71
-
-[x] Clustering algoritms
72
-
-[x]`sklearn.cluster`-compatible algorithms
73
-
-[x]`TrivialClustering` to skip clustering
74
-
-[x]`CoverClustering` for clustering induced by cover
0 commit comments