Skip to content

Commit cd8793c

Browse files
Create README.md
1 parent 9549af8 commit cd8793c

File tree

1 file changed

+82
-0
lines changed

1 file changed

+82
-0
lines changed

README.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# HDBSCAN-CPP
2+
Fast and Efficient Implementation of HDBSCAN in C++ using STL.
3+
Authored by:
4+
Sumedh Basarkod
5+
Rohan Mohapatra
6+
--------------------------------------------------------------------------------------------------------------
7+
8+
The Standard Template Library (STL) is a set of C++ template classes to provide common programming
9+
data structures and functions such as lists, stacks, arrays, etc. It is a library of container classes, algorithms, and iterators.
10+
11+
# About HDBSCAN
12+
HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over varying epsilon values and integrates the result to find a clustering that gives the best stability over epsilon. This allows HDBSCAN to find clusters of varying densities (unlike DBSCAN), and be more robust to parameter selection.
13+
14+
In practice this means that HDBSCAN returns a good clustering straight away with little or no parameter tuning -- and the primary parameter, minimum cluster size, is intuitive and easy to select.
15+
16+
HDBSCAN is ideal for exploratory data analysis; it's a fast and robust algorithm that you can trust to return meaningful clusters (if there are any).
17+
18+
Based on the paper:
19+
> R. Campello, D. Moulavi, and J. Sander, Density-Based Clustering Based on Hierarchical Density Estimates In: Advances in Knowledge Discovery and Data Mining, Springer, pp 160-172. 2013
20+
21+
### How to Run this code?
22+
23+
Clone this project as this contains the library.
24+
```
25+
git clone https://github.com/rohanmohapatra/hdbscan-cpp.git
26+
```
27+
28+
Run the Makefile
29+
```
30+
make all clean
31+
```
32+
33+
Wait for it to complete, this will run the already present example in the Four Prominent Cluster Example Folder. Plot the points and see the clustering.
34+
To run:
35+
```
36+
./main
37+
```
38+
39+
If you want to use it , have a look at the example and use it.
40+
41+
42+
43+
### Outlier Detection
44+
The HDBSCAN clusterer objects also support the GLOSH outlier detection algorithm. After fitting the clusterer to
45+
data the outlier scores can be accessed via the `outlierScores_` from the `Hdbscan` Object. The result is a vector of score values,
46+
one for each data point that was fit. Higher scores represent more outlier like objects. Selecting outliers via upper
47+
quantiles is often a good approach.
48+
49+
Based on the papers:
50+
> R.J.G.B. Campello, D. Moulavi, A. Zimek and J. Sander Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection, ACM Trans. on Knowledge Discovery from Data, Vol 10, 1 (July 2015), 1-51.
51+
52+
## Examples
53+
```
54+
#include<iostream>
55+
#include"../HDBSCAN-CPP/Hdbscan/hdbscan.hpp"
56+
using namespace std;
57+
int main() {
58+
59+
Hdbscan hdbscan("HDBSCANDataset/FourProminentClusterDataset.csv");
60+
hdbscan.loadCsv(2);
61+
hdbscan.execute(5, 5, "Euclidean");
62+
hdbscan.displayResult();
63+
cout << "You can access other fields like cluster labels, membership probabilities and outlier scores."<<endl;
64+
65+
/*Use it like this
66+
hdbscan.labels_;
67+
hdbscan.membershipProbabilities_;
68+
hdbscan.outlierScores_;
69+
*/
70+
return 0;
71+
72+
}
73+
```
74+
75+
### Help and Support
76+
If you have issues, please check the issues on github. Finally, if no solution is available there feel free to open an issue ;
77+
the authors will attempt to respond.
78+
79+
### Contributing
80+
We welcome contributions in any form! Assistance with documentation, is always welcome.
81+
To contribute please fork the project make your changes and submit a pull request.
82+
We will do our best to work through any issues with you and get your code merged into the master branch.

0 commit comments

Comments
 (0)