Skip to content

Commit 7e64acb

Browse files
committed
Improved docs. Versions fixed
1 parent 74e9016 commit 7e64acb

File tree

7 files changed

+345
-189
lines changed

7 files changed

+345
-189
lines changed

app/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
streamlit>=1.40.0,<2.0.0
22
numpy>=1.25.2,<2.0.0
33
scikit-learn>=1.5.0,<1.6.0
4-
umap-learn>=0.5.7,>0.6.0
4+
umap-learn>=0.5.7,<0.6.0
55
pandas>=2.1.0,<3.0.0
66
tda-mapper>=0.9.0,<0.10.0
77
plotly>=6.0.0,<7.0.0

docs/source/examples.rst

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,8 @@ Examples
22
========
33

44
.. toctree::
5-
:maxdepth: 2
5+
:maxdepth: 1
66

7-
notebooks/circles_online
8-
notebooks/digits_online
9-
10-
.. |Dataset| image:: https://github.com/lucasimi/tda-mapper-python/raw/main/resources/circles_dataset.png
11-
.. |Mapper graph (average)| image:: https://github.com/lucasimi/tda-mapper-python/raw/main/resources/circles_mean.png
12-
.. |Mapper graph (standard deviation)| image:: https://github.com/lucasimi/tda-mapper-python/raw/main/resources/circles_std.png
7+
notebooks/circles
8+
notebooks/digits
9+

docs/source/notebooks/circles.py

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
# ---
2+
# jupyter:
3+
# jupytext:
4+
# text_representation:
5+
# extension: .py
6+
# format_name: percent
7+
# format_version: '1.3'
8+
# jupytext_version: 1.17.0
9+
# kernelspec:
10+
# display_name: default
11+
# language: python
12+
# name: python3
13+
# ---
14+
15+
# %% [markdown]
16+
# # Example 1: Exploring Shape
17+
# In this notebook, we use the **Mapper algorithm** to analyze a toy dataset
18+
# composed of two concentric circles. This simple example is a classic case in
19+
# topology and machine learning, and it's perfect for gaining an intuitive
20+
# understanding of how Mapper captures shape. Although this dataset is
21+
# synthetic and well understood, it's ideal for visualizing how Mapper detects
22+
# underlying **topological structures**—in this case, two distinct loops. The
23+
# resulting Mapper graph should ideally reveal two connected components,
24+
# corresponding to the two circular regions.
25+
26+
27+
# %% [markdown]
28+
# ### Mapper pipeline
29+
30+
# %% [markdown]
31+
# We generate a synthetic dataset using `make_circles`, which creates two
32+
# concentric circles in 2D space. To prepare the data for Mapper, we apply
33+
# **Principal Component Analysis (PCA)** to extract the top two components.
34+
# These will serve as our **lens function**, which helps Mapper cover the data
35+
# in a meaningful way. Even though the dataset is already 2D, PCA is still a
36+
# useful and consistent choice for this example, especially when scaling up to
37+
# higher-dimensional problems.
38+
39+
40+
# %%
41+
import numpy as np
42+
from matplotlib import pyplot as plt
43+
from sklearn.cluster import DBSCAN
44+
from sklearn.datasets import make_circles
45+
from sklearn.decomposition import PCA
46+
47+
from tdamapper.cover import CubicalCover
48+
from tdamapper.learn import MapperAlgorithm
49+
from tdamapper.plot import MapperPlot
50+
51+
X, labels = make_circles(n_samples=5000, noise=0.05, factor=0.3, random_state=42)
52+
53+
fig = plt.figure(figsize=(5, 5), dpi=100)
54+
plt.scatter(X[:, 0], X[:, 1], c=labels, s=0.25, cmap="jet")
55+
plt.axis("off")
56+
plt.show()
57+
# fig.savefig("circles_dataset.png", dpi=100)
58+
59+
y = PCA(2, random_state=42).fit_transform(X)
60+
61+
# %% [markdown]
62+
# We now build the Mapper graph using the PCA output as the lens. Mapper
63+
# requires two key components:
64+
#
65+
# - A **cover** algorithm that defines how the data is grouped together along
66+
# the lens
67+
# - A **clustering algorithm** that splits each set of the open cover.
68+
#
69+
# In this example, we use a **cubical cover** with 10 intervals and 30%
70+
# overlap, and we apply **DBSCAN** for clustering, which is well-suited for
71+
# identifying arbitrary shapes. Choosing these parameters often involves some
72+
# trial and error based on the dataset and the desired resolution of the Mapper
73+
# graph.
74+
75+
# %%
76+
mapper = MapperAlgorithm(
77+
cover=CubicalCover(n_intervals=10, overlap_frac=0.3), clustering=DBSCAN()
78+
)
79+
graph = mapper.fit_transform(X, y)
80+
81+
# %% [markdown]
82+
# ### Visualization
83+
84+
# %% [markdown]
85+
# We visualize the Mapper graph by coloring each node according to the **mean**
86+
# class label (0 or 1). Since the dataset contains two classes—one for each
87+
# circle—this coloring helps us verify whether the graph structure aligns with
88+
# the true geometry of the data. Ideally, nodes corresponding to the inner and
89+
# outer circles will show clear separation in color, revealing two distinct
90+
# connected components in the graph.
91+
92+
# %%
93+
plot = MapperPlot(graph, dim=2, iterations=60, seed=42)
94+
95+
fig = plot.plot_plotly(colors=labels, cmap="jet", agg=np.nanmean, width=600, height=600)
96+
97+
fig.show(config={"scrollZoom": True})
98+
# fig.write_image("circles_mean.png", width=500, height=500)
99+
100+
# %% [markdown]
101+
# To explore areas where the two classes might overlap or be hard to
102+
# distinguish, we color each node by the **standard deviation** of class
103+
# labels. A low standard deviation (close to 0) indicates that all samples in a
104+
# node belong to the same class, while a higher value suggests label ambiguity
105+
# within the node. This helps highlight transitional regions in the dataset
106+
# where class boundaries may not be as sharp—useful when analyzing real-world
107+
# data where such ambiguity is common.
108+
109+
# %%
110+
plot.plot_plotly_update(
111+
fig,
112+
colors=labels,
113+
cmap="viridis",
114+
agg=np.nanstd,
115+
)
116+
117+
fig.show(config={"scrollZoom": True})
118+
# fig.write_image("circles_std.png", width=500, height=500)
119+
120+
# %% [markdown]
121+
# ### Conclusions
122+
# This simple example demonstrates how the Mapper algorithm can uncover
123+
# meaningful topological structures, even in a basic synthetic dataset. By
124+
# combining dimensionality reduction (PCA), a thoughtful cover strategy, and
125+
# clustering, Mapper captures the two-loop shape of concentric circles and
126+
# visualizes label consistency and ambiguity across the dataset. This forms a
127+
# solid foundation for applying Mapper to more complex, real-world datasets.

docs/source/notebooks/circles_online.py

Lines changed: 0 additions & 79 deletions
This file was deleted.

0 commit comments

Comments
 (0)