Skip to content

Commit e04a85e

Browse files
author
Giulia Baldini
committed
Change example dataset
1 parent 20b39cd commit e04a85e

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ print(bico.cluster_centers_)
6464

6565
## Example with Large Datasets
6666

67-
For very large datasets, the data may not actually fit in memory. In this case, you can use `partial_fit` to stream the data in chunks. In this example, we use the [BigCross dataset](https://cs.uni-paderborn.de/cuk/forschung/abgeschlossene-projekte/dfg-schwerpunktprogramm-1307/streamkm).
67+
For very large datasets, the data may not actually fit in memory. In this case, you can use `partial_fit` to stream the data in chunks. In this example, we use the [US Census Data (1990) dataset](https://archive.ics.uci.edu/dataset/116/us+census+data+1990). You can find more examples in the [tests](./tests/test.py) folder.
6868

6969
```python
7070
from bico import BICO
@@ -77,9 +77,9 @@ data = np.random.rand(10000, 10)
7777

7878
start = time.time()
7979
bico = BICO(n_clusters=3, random_state=0)
80-
for i, chunk in enumerate(pd.read_csv(
81-
"bigcross.txt", delimiter=",", header=None, chunksize=10000
82-
)):
80+
for chunk in pd.read_csv(
81+
"census.txt", delimiter=",", header=None, chunksize=10000
82+
):
8383
bico.partial_fit(chunk.to_numpy(copy=False))
8484
# If a final `partial_fit` is called with no data, the coreset is computed
8585
bico.partial_fit()

0 commit comments

Comments
 (0)