Skip to content

Commit 07fd4e2

Browse files
authored
Merge pull request #44 from lucasimi/develop
Develop
2 parents 58ad179 + 7ed4f20 commit 07fd4e2

26 files changed

+598
-287
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
**/*.egg-info
55
**/.ipynb_checkpoints
66
**/*.log
7+
**/docs/build
78

89
.coverage
910
.vscode

.readthedocs.yml

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# .readthedocs.yaml
2+
# Read the Docs configuration file
3+
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
4+
5+
# Required
6+
version: 2
7+
8+
# Set the OS, Python version and other tools you might need
9+
build:
10+
os: ubuntu-22.04
11+
tools:
12+
python: "3.12"
13+
# You can also specify other tool versions:
14+
# nodejs: "19"
15+
# rust: "1.64"
16+
# golang: "1.19"
17+
18+
# Build documentation in the "docs/" directory with Sphinx
19+
sphinx:
20+
configuration: docs/source/conf.py
21+
22+
# Optionally build your docs in additional formats such as PDF and ePub
23+
# formats:
24+
# - pdf
25+
# - epub
26+
27+
# Optional but recommended, declare the Python requirements required
28+
# to build your documentation
29+
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
30+
python:
31+
install:
32+
- requirements: docs/requirements.txt
33+
- method: pip
34+
path: .

README.md

Lines changed: 68 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,52 +1,85 @@
1-
# tda-mapper-python
1+
# tda-mapper
22

3-
![test](https://github.com/lucasimi/tda-mapper-python/actions/workflows/test.yml/badge.svg) [![codecov](https://codecov.io/github/lucasimi/tda-mapper-python/graph/badge.svg?token=FWSD8JUG6R)](https://codecov.io/github/lucasimi/tda-mapper-python)
3+
![test](https://github.com/lucasimi/tda-mapper-python/actions/workflows/test.yml/badge.svg)
4+
[![codecov](https://codecov.io/github/lucasimi/tda-mapper-python/graph/badge.svg?token=FWSD8JUG6R)](https://codecov.io/github/lucasimi/tda-mapper-python)
5+
[![docs](https://readthedocs.org/projects/tda-mapper/badge/?version=latest)](https://tda-mapper.readthedocs.io/en/latest/?badge=latest)
46

57
In recent years, an ever growing interest in **Topological Data Analysis** (TDA) emerged in the field of data science. The core idea of TDA is to gain insights from data by using topological methods that are proved to be reliable with respect to noise, and that behave nicely with respect to dimension. This Python package provides an implementation of the **Mapper Algorithm**, a well-known tool from TDA.
68

79
The Mapper Algorithm takes any dataset $X$ and returns a *shape-summary* in the form a graph $G$, called **Mapper Graph**. It's possible to prove, under reasonable conditions, that $X$ and $G$ share the same number of connected components.
810

9-
## Basics
11+
For an in-depth description of Mapper please read [the original paper](https://research.math.osu.edu/tgda/mapperPBG.pdf).
12+
13+
* Installation from package: TBD
14+
* Installation from sources: clone this repo and run ```python -m pip install .```
15+
* Documentation: https://tda-mapper.readthedocs.io/en/latest/
16+
17+
18+
## Usage
19+
20+
![In this file](https://github.com/lucasimi/tda-mapper-python/raw/main/tests/example.py) you can find a worked out example that shows how to use this package. We perform some analysis on the the well known dataset of ![hand written digits](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html), consisting of less than 2000 8x8 pictures represented as arrays of 64 elements.
21+
22+
```python
23+
import numpy as np
24+
25+
from sklearn.datasets import load_digits
26+
from sklearn.cluster import AgglomerativeClustering
27+
from sklearn.decomposition import PCA
28+
29+
from tdamapper.core import MapperAlgorithm
30+
from tdamapper.cover import CubicalCover
31+
from tdamapper.clustering import PermissiveClustering
32+
from tdamapper.plot import MapperPlot
33+
34+
# We load a labelled dataset
35+
X, y = load_digits(return_X_y=True)
36+
# We compute the lens values
37+
lens = PCA(2).fit_transform(X)
38+
39+
mapper_algo = MapperAlgorithm(
40+
cover=CubicalCover(
41+
n_intervals=10,
42+
overlap_frac=0.65),
43+
# We prevent clustering failures
44+
clustering=PermissiveClustering(
45+
clustering=AgglomerativeClustering(10),
46+
verbose=False),
47+
n_jobs=1)
48+
mapper_graph = mapper_algo.fit_transform(X, lens)
49+
50+
mapper_plot = MapperPlot(X, mapper_graph,
51+
# We color according to digit values
52+
colors=y,
53+
# Jet colormap, used for classes
54+
cmap='jet',
55+
# We aggregate on graph nodes according to mean
56+
agg=np.nanmean,
57+
dim=2,
58+
iterations=400)
59+
fig_mean = mapper_plot.plot(title='digit (mean)', width=600, height=600)
60+
fig_mean.show(config={'scrollZoom': True})
1061

11-
Let $f$ be any chosen *lens*, i.e. a continuous map $f \colon X \to Y$, being $Y$ any parameter space (*typically* low dimensional). In order to build the Mapper Graph follow these steps:
12-
13-
1. Build an *open cover* for $f(X)$, i.e. a collection of *open sets* whose union makes the whole image $f(X)$.
14-
15-
2. Run clustering on the preimage of each open set. All these local clusters together make a *refined open cover* for $X$.
16-
17-
3. Build the mapper graph $G$ by taking a node for each local cluster, and by drawing an edge between two nodes whenever their corresponding local clusters intersect.
62+
```
1863

19-
To get an idea, in the following picture we have $X$ as an X-shaped point cloud in $\mathbb{R}^2$, with $f$ being the *height function*, i.e. the projection on the $y$-axis. In the leftmost part we cover the projection of $X$ with three open sets. Every open set is represented with a different color. Then we take the preimage of these sets, cluster then, and finally build the graph according to intersections.
64+
![Mapper Graph of the digits dataset, colored according to mean value](https://github.com/lucasimi/tda-mapper-python/raw/main/resources/digits_mean.png)
2065

21-
![Steps](resources/mapper.png)
66+
It's also possible to obtain a new plot colored according to different values, while keeping the same computed geometry. For example, if we want to visualize how much dispersion we have on each cluster, we could plot colors according to the standard deviation
2267

23-
The choice of the lens is the most relevant on the shape of the Mapper Graph. Some common choices are *statistics*, *projections*, *entropy*, *density*, *eccentricity*, and so forth. However, in order to pick a good lens, specific domain knowledge for the data at hand can give a hint. For an in-depth description of Mapper please read [the original paper](https://research.math.osu.edu/tgda/mapperPBG.pdf).
2468

25-
## Installation
69+
```python
70+
# We reuse the graph plot with the same positions
71+
fig_std = mapper_plot.with_colors(
72+
colors=y,
73+
# Viridis colormap, used for ranges
74+
cmap='viridis',
75+
# We aggregate on graph nodes according to std
76+
agg=np.nanstd,
77+
).plot(title='digit (std)', width=600, height=600)
78+
fig_std.show(config={'scrollZoom': True})
2679

27-
Clone this repo, and install via `pip` from your local directory
28-
```
29-
python -m pip install .
3080
```
31-
Alternatively, you can use `pip` to install directly from GitHub
32-
```
33-
pip install git+https://github.com/lucasimi/tda-mapper-python.git
34-
```
35-
If you want to install the version from a specific branch, for example `develop`, you can run
36-
```
37-
pip install git+https://github.com/lucasimi/tda-mapper-python.git@develop
38-
```
39-
40-
## A worked out example
4181

42-
![In this file](tests/example.py) you can find a worked out example that shows how to use this package.
43-
We perform some analysis on the the well known dataset of ![hand written digits](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html), consisting of less than 2000 8x8 pictures represented as arrays of 64 elements.
44-
45-
![The mapper graph of the digits dataset, colored according to mean value](resources/digits_mean.png)
46-
47-
It's also possible to obtain a new plot colored according to different values, while keeping the same computed geometry. For example, if we want to visualize how much dispersion we have on each cluster, we could plot colors according to the standard deviation
48-
49-
![The mapper graph of the digits dataset, colored according to std](resources/digits_std.png)
82+
![Mapper Graph of the digits dataset, colored according to std](https://github.com/lucasimi/tda-mapper-python/raw/main/resources/digits_std.png)
5083

5184
The mapper graph of the digits dataset shows a few interesting patterns. For example, we can make the following observations:
5285

@@ -55,25 +88,3 @@ The mapper graph of the digits dataset shows a few interesting patterns. For exa
5588
* Some clusters are not well separated and tend to overlap one on the other. This mixed behavior is present in those digits which can be easily confused one with the other, for example digits 5 and 6.
5689

5790
* Clusters located across the "boundary" of two different digits show a transition either due to a change in distribution or due to distorsions in the hand written text, for example digits 8 and 2.
58-
59-
60-
### Development - Supported Features
61-
62-
- [x] Topology
63-
- [x] custom lenses
64-
- [x] custom metrics
65-
66-
- [x] Cover algorithms:
67-
- [x] `CubicalCover`
68-
- [x] `BallCover`
69-
- [x] `KnnCover`
70-
71-
- [x] Clustering algoritms
72-
- [x] `sklearn.cluster`-compatible algorithms
73-
- [x] `TrivialClustering` to skip clustering
74-
- [x] `CoverClustering` for clustering induced by cover
75-
76-
- [x] Plot
77-
- [x] 2d interactive plot
78-
- [x] 3d interactive plot
79-
- [ ] HTML embeddable plot

docs/Makefile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = source
9+
BUILDDIR = build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

docs/make.bat

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
@ECHO OFF
2+
3+
pushd %~dp0
4+
5+
REM Command file for Sphinx documentation
6+
7+
if "%SPHINXBUILD%" == "" (
8+
set SPHINXBUILD=sphinx-build
9+
)
10+
set SOURCEDIR=source
11+
set BUILDDIR=build
12+
13+
%SPHINXBUILD% >NUL 2>NUL
14+
if errorlevel 9009 (
15+
echo.
16+
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
17+
echo.installed, then set the SPHINXBUILD environment variable to point
18+
echo.to the full path of the 'sphinx-build' executable. Alternatively you
19+
echo.may add the Sphinx directory to PATH.
20+
echo.
21+
echo.If you don't have Sphinx installed, grab it from
22+
echo.https://www.sphinx-doc.org/
23+
exit /b 1
24+
)
25+
26+
if "%1" == "" goto help
27+
28+
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
29+
goto end
30+
31+
:help
32+
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
33+
34+
:end
35+
popd

docs/requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
sphinx_rtd_theme

docs/source/conf.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Configuration file for the Sphinx documentation builder.
2+
#
3+
# For the full list of built-in configuration values, see the documentation:
4+
# https://www.sphinx-doc.org/en/master/usage/configuration.html
5+
6+
# -- Project information -----------------------------------------------------
7+
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
8+
9+
project = 'tda-mapper'
10+
copyright = '2024, Luca Simi'
11+
author = 'Luca Simi'
12+
13+
# -- General configuration ---------------------------------------------------
14+
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
15+
16+
extensions = ['sphinx.ext.autodoc', 'sphinx_rtd_theme']
17+
18+
templates_path = ['_templates']
19+
exclude_patterns = []
20+
21+
22+
23+
# -- Options for HTML output -------------------------------------------------
24+
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
25+
26+
html_theme = 'sphinx_rtd_theme'
27+
html_static_path = ['_static']

docs/source/index.rst

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
.. tda-mapper documentation master file, created by
2+
sphinx-quickstart on Fri Jan 26 21:56:08 2024.
3+
You can adapt this file completely to your liking, but it should at least
4+
contain the root `toctree` directive.
5+
6+
Welcome to tda-mapper's documentation!
7+
======================================
8+
9+
.. toctree::
10+
:maxdepth: 2
11+
:caption: Contents:
12+
13+
modules
14+
15+
Indices and tables
16+
==================
17+
18+
* :ref:`genindex`
19+
* :ref:`modindex`
20+
* :ref:`search`

docs/source/modules.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
API Reference
2+
=============
3+
4+
.. toctree::
5+
:maxdepth: 4
6+
7+
tdamapper

docs/source/tdamapper.rst

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
tdamapper.core Mapper Algorithm
2+
-------------------------------
3+
4+
.. automodule:: tdamapper.core
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
8+
9+
tdamapper.cover Cover Algorithms
10+
--------------------------------
11+
12+
.. automodule:: tdamapper.cover
13+
:members:
14+
:undoc-members:
15+
:show-inheritance:
16+
17+
tdamapper.clustering Clustering Algorithms
18+
------------------------------------------
19+
20+
.. automodule:: tdamapper.clustering
21+
:members:
22+
:undoc-members:
23+
:show-inheritance:
24+
25+
tdamapper.plot Mapper Plot
26+
--------------------------
27+
28+
.. automodule:: tdamapper.plot
29+
:members:
30+
:undoc-members:
31+
:show-inheritance:

0 commit comments

Comments
 (0)