Skip to content

Commit eb60a83

Browse files
committed
SIPU worms datasets
1 parent 66fe179 commit eb60a83

File tree

10 files changed

+116
-40
lines changed

10 files changed

+116
-40
lines changed

README.md

Lines changed: 43 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -190,10 +190,10 @@ We have tried to resolve any conflicts in the *best* possible manner.
190190
We excluded the `DIM`-sets as they turn out to be too easy
191191
for most algorithms.
192192

193-
5. [`uci`](catalog/uci.md) -
193+
5. [`uci`](catalogue/uci.md) -
194194
a selection of datasets available at the University of California, Irvine,
195195
[Machine Learning Repository](http://archive.ics.uci.edu/ml/)
196-
(Dua and Graff, 2018)
196+
(Dua and Graff, 2019)
197197

198198
Some of these datasets in this selection were considered
199199
for benchmark purposes
@@ -203,14 +203,14 @@ We have tried to resolve any conflicts in the *best* possible manner.
203203

204204
6. [`wut`](catalogue/wut.md) -
205205
authored by the fantastic students
206-
of Marek's [Python for Data Analysis course](http://www.gagolewski.com/teaching/padpy/) @
207-
[Warsaw University of Technology](https://ww4.mini.pw.edu.pl/):
206+
of Marek Gagolewski's Python for Data Analysis course at
207+
Warsaw University of Technology:
208208
Przemysław Kosewski, Jędrzej Krauze, Eliza Kaczorek, Anna Gierlak,
209209
Adam Wawrzyniak, Aleksander Truszczyński, Mateusz Kobyłka and Michał Maciąg.
210210

211211

212212
7. [`g2mg`](catalogue/g2mg.md) -
213-
a modified version of the SIPU `G2`-sets with variances
213+
a modified version of `G2`-sets from SIPU with variances
214214
dependent on datasets' dimensionalities, i.e., s*np.sqrt(d/2),
215215
which makes these problems more difficult.
216216

@@ -278,40 +278,43 @@ We have tried to resolve any conflicts in the *best* possible manner.
278278
|43 |sipu/s4 | 5000| 2|
279279
|44 |sipu/spiral | 312| 2|
280280
|45 |sipu/unbalance | 6500| 2|
281-
|46 |uci/ecoli | 336| 7|
282-
|47 |uci/glass | 214| 9|
283-
|48 |uci/ionosphere | 351| 34|
284-
|49 |uci/sonar | 208| 60|
285-
|50 |uci/statlog | 2310| 19|
286-
|51 |uci/wdbc | 569| 30|
287-
|52 |uci/wine | 178| 13|
288-
|53 |uci/yeast | 1484| 8|
289-
|54 |wut/circles | 4000| 2|
290-
|55 |wut/cross | 2000| 2|
291-
|56 |wut/graph | 2500| 2|
292-
|57 |wut/isolation | 9000| 2|
293-
|58 |wut/labirynth | 3546| 2|
294-
|59 |wut/mk1 | 300| 2|
295-
|60 |wut/mk2 | 1000| 2|
296-
|61 |wut/mk3 | 600| 3|
297-
|62 |wut/mk4 | 1500| 3|
298-
|63 |wut/olympic | 5000| 2|
299-
|64 |wut/smile | 1000| 2|
300-
|65 |wut/stripes | 5000| 2|
301-
|66 |wut/trajectories | 10000| 2|
302-
|67 |wut/trapped_lovers | 5000| 3|
303-
|68 |wut/twosplashes | 400| 2|
304-
|69 |wut/windows | 2977| 2|
305-
|70 |wut/x1 | 120| 2|
306-
|71 |wut/x2 | 120| 2|
307-
|72 |wut/x3 | 185| 2|
308-
|73 |wut/z1 | 192| 2|
309-
|74 |wut/z2 | 900| 2|
310-
|75 |wut/z3 | 1000| 2|
311-
312-
313-
314-
We recommend that `h2mg` sets should be studied separately
281+
|46 |sipu/worms_2 | 105600| 2|
282+
|47 |sipu/worms_64 | 105000| 64|
283+
|48 |uci/ecoli | 336| 7|
284+
|49 |uci/glass | 214| 9|
285+
|50 |uci/ionosphere | 351| 34|
286+
|51 |uci/sonar | 208| 60|
287+
|52 |uci/statlog | 2310| 19|
288+
|53 |uci/wdbc | 569| 30|
289+
|54 |uci/wine | 178| 13|
290+
|55 |uci/yeast | 1484| 8|
291+
|56 |wut/circles | 4000| 2|
292+
|57 |wut/cross | 2000| 2|
293+
|58 |wut/graph | 2500| 2|
294+
|59 |wut/isolation | 9000| 2|
295+
|60 |wut/labirynth | 3546| 2|
296+
|61 |wut/mk1 | 300| 2|
297+
|62 |wut/mk2 | 1000| 2|
298+
|63 |wut/mk3 | 600| 3|
299+
|64 |wut/mk4 | 1500| 3|
300+
|65 |wut/olympic | 5000| 2|
301+
|66 |wut/smile | 1000| 2|
302+
|67 |wut/stripes | 5000| 2|
303+
|68 |wut/trajectories | 10000| 2|
304+
|69 |wut/trapped_lovers | 5000| 3|
305+
|70 |wut/twosplashes | 400| 2|
306+
|71 |wut/windows | 2977| 2|
307+
|72 |wut/x1 | 120| 2|
308+
|73 |wut/x2 | 120| 2|
309+
|74 |wut/x3 | 185| 2|
310+
|75 |wut/z1 | 192| 2|
311+
|76 |wut/z2 | 900| 2|
312+
|77 |wut/z3 | 1000| 2|
313+
314+
315+
316+
317+
We recommend that the `h2mg` sets should be studied separately
315318
(there are too many of them -- they can easily overshadow the
316319
above ones).
317320

@@ -336,7 +339,7 @@ above ones).
336339
|72 |h2mg/h2mg_128_90 | 2048| 128|
337340

338341

339-
We recommend that `g2mg` sets should be studied separately as well.
342+
The `g2mg` sets should be studied separately too.
340343

341344

342345
| |dataset | n| d|

catalogue/sipu.csv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,3 +25,5 @@ sipu/s3,5000,2,labels0,15,0,0.0256
2525
sipu/s4,5000,2,labels0,15,0,0.026314285714285717
2626
sipu/spiral,312,2,labels0,3,0,0.016025641025641024
2727
sipu/unbalance,6500,2,labels0,8,0,0.6263736263736264
28+
sipu/worms_2,105600,2,labels0,35,0,0.2825133689839572
29+
sipu/worms_64,105000,64,labels0,25,0,0.0

catalogue/sipu.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ is maintained by [Marek Gagolewski](http://www.gagolewski.com)**
2424
* [sipu/s4](#sipu_s4)
2525
* [sipu/spiral](#sipu_spiral)
2626
* [sipu/unbalance](#sipu_unbalance)
27+
* [sipu/worms_2](#sipu_worms_2)
28+
* [sipu/worms_64](#sipu_worms_64)
2729

2830
--------------------------------------------------------------------------------
2931

@@ -523,3 +525,54 @@ label_counts=[2000, 2000, 2000, 100, 100, 100, 100, 100]
523525

524526

525527

528+
## sipu/worms_2 (n=105600, d=2) <a name="sipu_worms_2"></a>
529+
530+
Synthetic 2D data with worm-like shapes
531+
532+
Source: S. Sieranoja and P. Fränti,
533+
Fast and general density peaks clustering,
534+
Pattern Recognition Letters, 128, 551-558, 2019.
535+
536+
Web: https://cs.joensuu.fi/sipu/datasets/
537+
538+
`labels0` come from the Authors.
539+
540+
541+
542+
#### `labels0`
543+
544+
true_k=35, noise= 0, true_g=0.283
545+
546+
label_counts=[3120, 4560, 4368, 4008, 3648, 3144, 1992, 1008, 4464, 936, 2904, 1296, 2496, 2328, 4968, 5880, 3696, 4896, 2160, 2160, 3048, 5640, 1752, 1176, 4968, 4920, 768, 2472, 1392, 1752, 3840, 2664, 840, 3336, 3000]
547+
548+
![](sipu/worms_2.labels0.png)
549+
550+
551+
552+
553+
## sipu/worms_64 (n=105000, d=64) <a name="sipu_worms_64"></a>
554+
555+
Synthetic 64D data with worm-like shapes
556+
557+
Source: S. Sieranoja and P. Fränti,
558+
Fast and general density peaks clustering,
559+
Pattern Recognition Letters, 128, 551-558, 2019.
560+
561+
Web: https://cs.joensuu.fi/sipu/datasets/
562+
563+
`labels0` come from the Authors.
564+
565+
566+
567+
#### `labels0`
568+
569+
true_k=25, noise= 0, true_g=0.000
570+
571+
label_counts=[4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200, 4200]
572+
573+
> **(preview generation suppressed)**
574+
575+
576+
577+
578+

catalogue/sipu/worms_2.labels0.png

298 KB
Loading

sipu/worms_2.data.gz

518 KB
Binary file not shown.

sipu/worms_2.labels0.gz

406 Bytes
Binary file not shown.

sipu/worms_2.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Synthetic 2D data with worm-like shapes
2+
3+
Source: S. Sieranoja and P. Fränti,
4+
Fast and general density peaks clustering,
5+
Pattern Recognition Letters, 128, 551-558, 2019.
6+
7+
Web: https://cs.joensuu.fi/sipu/datasets/
8+
9+
`labels0` come from the Authors.

sipu/worms_64.data.gz

18.3 MB
Binary file not shown.

sipu/worms_64.labels0.gz

369 Bytes
Binary file not shown.

sipu/worms_64.txt

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Synthetic 64D data with worm-like shapes
2+
3+
Source: S. Sieranoja and P. Fränti,
4+
Fast and general density peaks clustering,
5+
Pattern Recognition Letters, 128, 551-558, 2019.
6+
7+
Web: https://cs.joensuu.fi/sipu/datasets/
8+
9+
`labels0` come from the Authors.

0 commit comments

Comments
 (0)