Skip to content

Commit 7ed4f20

Browse files
authored
Merge pull request #43 from lucasimi/feature/add-docs
Updated readme and example
2 parents 60a50e5 + a988ee9 commit 7ed4f20

File tree

2 files changed

+87
-67
lines changed

2 files changed

+87
-67
lines changed

README.md

Lines changed: 67 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,85 @@
11
# tda-mapper
22

3-
![test](https://github.com/lucasimi/tda-mapper-python/actions/workflows/test.yml/badge.svg) [![codecov](https://codecov.io/github/lucasimi/tda-mapper-python/graph/badge.svg?token=FWSD8JUG6R)](https://codecov.io/github/lucasimi/tda-mapper-python)
3+
![test](https://github.com/lucasimi/tda-mapper-python/actions/workflows/test.yml/badge.svg)
4+
[![codecov](https://codecov.io/github/lucasimi/tda-mapper-python/graph/badge.svg?token=FWSD8JUG6R)](https://codecov.io/github/lucasimi/tda-mapper-python)
5+
[![docs](https://readthedocs.org/projects/tda-mapper/badge/?version=latest)](https://tda-mapper.readthedocs.io/en/latest/?badge=latest)
46

57
In recent years, an ever growing interest in **Topological Data Analysis** (TDA) emerged in the field of data science. The core idea of TDA is to gain insights from data by using topological methods that are proved to be reliable with respect to noise, and that behave nicely with respect to dimension. This Python package provides an implementation of the **Mapper Algorithm**, a well-known tool from TDA.
68

79
The Mapper Algorithm takes any dataset $X$ and returns a *shape-summary* in the form a graph $G$, called **Mapper Graph**. It's possible to prove, under reasonable conditions, that $X$ and $G$ share the same number of connected components.
810

9-
## Basics
11+
For an in-depth description of Mapper please read [the original paper](https://research.math.osu.edu/tgda/mapperPBG.pdf).
12+
13+
* Installation from package: TBD
14+
* Installation from sources: clone this repo and run ```python -m pip install .```
15+
* Documentation: https://tda-mapper.readthedocs.io/en/latest/
16+
17+
18+
## Usage
19+
20+
![In this file](https://github.com/lucasimi/tda-mapper-python/raw/main/tests/example.py) you can find a worked out example that shows how to use this package. We perform some analysis on the the well known dataset of ![hand written digits](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html), consisting of less than 2000 8x8 pictures represented as arrays of 64 elements.
21+
22+
```python
23+
import numpy as np
24+
25+
from sklearn.datasets import load_digits
26+
from sklearn.cluster import AgglomerativeClustering
27+
from sklearn.decomposition import PCA
28+
29+
from tdamapper.core import MapperAlgorithm
30+
from tdamapper.cover import CubicalCover
31+
from tdamapper.clustering import PermissiveClustering
32+
from tdamapper.plot import MapperPlot
33+
34+
# We load a labelled dataset
35+
X, y = load_digits(return_X_y=True)
36+
# We compute the lens values
37+
lens = PCA(2).fit_transform(X)
38+
39+
mapper_algo = MapperAlgorithm(
40+
cover=CubicalCover(
41+
n_intervals=10,
42+
overlap_frac=0.65),
43+
# We prevent clustering failures
44+
clustering=PermissiveClustering(
45+
clustering=AgglomerativeClustering(10),
46+
verbose=False),
47+
n_jobs=1)
48+
mapper_graph = mapper_algo.fit_transform(X, lens)
49+
50+
mapper_plot = MapperPlot(X, mapper_graph,
51+
# We color according to digit values
52+
colors=y,
53+
# Jet colormap, used for classes
54+
cmap='jet',
55+
# We aggregate on graph nodes according to mean
56+
agg=np.nanmean,
57+
dim=2,
58+
iterations=400)
59+
fig_mean = mapper_plot.plot(title='digit (mean)', width=600, height=600)
60+
fig_mean.show(config={'scrollZoom': True})
1061

11-
Let $f$ be any chosen *lens*, i.e. a continuous map $f \colon X \to Y$, being $Y$ any parameter space (*typically* low dimensional). In order to build the Mapper Graph follow these steps:
12-
13-
1. Build an *open cover* for $f(X)$, i.e. a collection of *open sets* whose union makes the whole image $f(X)$.
14-
15-
2. Run clustering on the preimage of each open set. All these local clusters together make a *refined open cover* for $X$.
16-
17-
3. Build the mapper graph $G$ by taking a node for each local cluster, and by drawing an edge between two nodes whenever their corresponding local clusters intersect.
62+
```
1863

19-
To get an idea, in the following picture we have $X$ as an X-shaped point cloud in $\mathbb{R}^2$, with $f$ being the *height function*, i.e. the projection on the $y$-axis. In the leftmost part we cover the projection of $X$ with three open sets. Every open set is represented with a different color. Then we take the preimage of these sets, cluster then, and finally build the graph according to intersections.
64+
![Mapper Graph of the digits dataset, colored according to mean value](https://github.com/lucasimi/tda-mapper-python/raw/main/resources/digits_mean.png)
2065

21-
![Steps](resources/mapper.png)
66+
It's also possible to obtain a new plot colored according to different values, while keeping the same computed geometry. For example, if we want to visualize how much dispersion we have on each cluster, we could plot colors according to the standard deviation
2267

23-
The choice of the lens is the most relevant on the shape of the Mapper Graph. Some common choices are *statistics*, *projections*, *entropy*, *density*, *eccentricity*, and so forth. However, in order to pick a good lens, specific domain knowledge for the data at hand can give a hint. For an in-depth description of Mapper please read [the original paper](https://research.math.osu.edu/tgda/mapperPBG.pdf).
2468

25-
## Installation
69+
```python
70+
# We reuse the graph plot with the same positions
71+
fig_std = mapper_plot.with_colors(
72+
colors=y,
73+
# Viridis colormap, used for ranges
74+
cmap='viridis',
75+
# We aggregate on graph nodes according to std
76+
agg=np.nanstd,
77+
).plot(title='digit (std)', width=600, height=600)
78+
fig_std.show(config={'scrollZoom': True})
2679

27-
Clone this repo, and install via `pip` from your local directory
28-
```
29-
python -m pip install .
3080
```
31-
Alternatively, you can use `pip` to install directly from GitHub
32-
```
33-
pip install git+https://github.com/lucasimi/tda-mapper-python.git
34-
```
35-
If you want to install the version from a specific branch, for example `develop`, you can run
36-
```
37-
pip install git+https://github.com/lucasimi/tda-mapper-python.git@develop
38-
```
39-
40-
## A worked out example
4181

42-
![In this file](tests/example.py) you can find a worked out example that shows how to use this package.
43-
We perform some analysis on the the well known dataset of ![hand written digits](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html), consisting of less than 2000 8x8 pictures represented as arrays of 64 elements.
44-
45-
![The mapper graph of the digits dataset, colored according to mean value](resources/digits_mean.png)
46-
47-
It's also possible to obtain a new plot colored according to different values, while keeping the same computed geometry. For example, if we want to visualize how much dispersion we have on each cluster, we could plot colors according to the standard deviation
48-
49-
![The mapper graph of the digits dataset, colored according to std](resources/digits_std.png)
82+
![Mapper Graph of the digits dataset, colored according to std](https://github.com/lucasimi/tda-mapper-python/raw/main/resources/digits_std.png)
5083

5184
The mapper graph of the digits dataset shows a few interesting patterns. For example, we can make the following observations:
5285

@@ -55,25 +88,3 @@ The mapper graph of the digits dataset shows a few interesting patterns. For exa
5588
* Some clusters are not well separated and tend to overlap one on the other. This mixed behavior is present in those digits which can be easily confused one with the other, for example digits 5 and 6.
5689

5790
* Clusters located across the "boundary" of two different digits show a transition either due to a change in distribution or due to distorsions in the hand written text, for example digits 8 and 2.
58-
59-
60-
### Development - Supported Features
61-
62-
- [x] Topology
63-
- [x] custom lenses
64-
- [x] custom metrics
65-
66-
- [x] Cover algorithms:
67-
- [x] `CubicalCover`
68-
- [x] `BallCover`
69-
- [x] `KnnCover`
70-
71-
- [x] Clustering algoritms
72-
- [x] `sklearn.cluster`-compatible algorithms
73-
- [x] `TrivialClustering` to skip clustering
74-
- [x] `CoverClustering` for clustering induced by cover
75-
76-
- [x] Plot
77-
- [x] 2d interactive plot
78-
- [x] 3d interactive plot
79-
- [ ] HTML embeddable plot

tests/example.py

Lines changed: 20 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -9,31 +9,40 @@
99
from tdamapper.clustering import PermissiveClustering
1010
from tdamapper.plot import MapperPlot
1111

12-
X, y = load_digits(return_X_y=True) # We load a labelled dataset
13-
lens = PCA(2).fit_transform(X) # We compute the lens values
12+
# We load a labelled dataset
13+
X, y = load_digits(return_X_y=True)
14+
# We compute the lens values
15+
lens = PCA(2).fit_transform(X)
1416

1517
mapper_algo = MapperAlgorithm(
1618
cover=CubicalCover(
1719
n_intervals=10,
1820
overlap_frac=0.65),
19-
clustering=PermissiveClustering( # We prevent clustering failures
21+
# We prevent clustering failures
22+
clustering=PermissiveClustering(
2023
clustering=AgglomerativeClustering(10),
2124
verbose=False),
2225
n_jobs=1)
2326
mapper_graph = mapper_algo.fit_transform(X, lens)
2427

2528
mapper_plot = MapperPlot(X, mapper_graph,
26-
colors=y, # We color according to digit values
27-
cmap='jet', # Jet colormap, used for classes
28-
agg=np.nanmean, # We aggregate on graph nodes according to mean
29+
# We color according to digit values
30+
colors=y,
31+
# Jet colormap, used for classes
32+
cmap='jet',
33+
# We aggregate on graph nodes according to mean
34+
agg=np.nanmean,
2935
dim=2,
3036
iterations=400)
3137
fig_mean = mapper_plot.plot(title='digit (mean)', width=600, height=600)
32-
#fig_mean.show(config={'scrollZoom': True}) # Uncomment to show the plot
38+
fig_mean.show(config={'scrollZoom': True})
3339

34-
fig_std = mapper_plot.with_colors( # We reuse the graph plot with the same positions
40+
# We reuse the graph plot with the same positions
41+
fig_std = mapper_plot.with_colors(
3542
colors=y,
36-
cmap='viridis', # Virtidis colormap, used for ranges
37-
agg=np.nanstd, # We aggregate on graph nodes according to std
43+
# Viridis colormap, used for ranges
44+
cmap='viridis',
45+
# We aggregate on graph nodes according to std
46+
agg=np.nanstd,
3847
).plot(title='digit (std)', width=600, height=600)
39-
#fig_std.show(config={'scrollZoom': True}) # Uncomment to show the plot
48+
fig_std.show(config={'scrollZoom': True})

0 commit comments

Comments
 (0)