|
| 1 | +# HDBSCAN-CPP |
| 2 | +Fast and Efficient Implementation of HDBSCAN in C++ using STL. |
| 3 | +Authored by: |
| 4 | +Sumedh Basarkod |
| 5 | +Rohan Mohapatra |
| 6 | +-------------------------------------------------------------------------------------------------------------- |
| 7 | + |
| 8 | +The Standard Template Library (STL) is a set of C++ template classes to provide common programming |
| 9 | +data structures and functions such as lists, stacks, arrays, etc. It is a library of container classes, algorithms, and iterators. |
| 10 | + |
| 11 | +# About HDBSCAN |
| 12 | +HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over varying epsilon values and integrates the result to find a clustering that gives the best stability over epsilon. This allows HDBSCAN to find clusters of varying densities (unlike DBSCAN), and be more robust to parameter selection. |
| 13 | + |
| 14 | +In practice this means that HDBSCAN returns a good clustering straight away with little or no parameter tuning -- and the primary parameter, minimum cluster size, is intuitive and easy to select. |
| 15 | + |
| 16 | +HDBSCAN is ideal for exploratory data analysis; it's a fast and robust algorithm that you can trust to return meaningful clusters (if there are any). |
| 17 | + |
| 18 | +Based on the paper: |
| 19 | +> R. Campello, D. Moulavi, and J. Sander, Density-Based Clustering Based on Hierarchical Density Estimates In: Advances in Knowledge Discovery and Data Mining, Springer, pp 160-172. 2013 |
| 20 | +
|
| 21 | +### How to Run this code? |
| 22 | + |
| 23 | +Clone this project as this contains the library. |
| 24 | +``` |
| 25 | +git clone https://github.com/rohanmohapatra/hdbscan-cpp.git |
| 26 | +``` |
| 27 | + |
| 28 | +Run the Makefile |
| 29 | +``` |
| 30 | +make all clean |
| 31 | +``` |
| 32 | + |
| 33 | +Wait for it to complete, this will run the already present example in the Four Prominent Cluster Example Folder. Plot the points and see the clustering. |
| 34 | +To run: |
| 35 | +``` |
| 36 | +./main |
| 37 | +``` |
| 38 | + |
| 39 | +If you want to use it , have a look at the example and use it. |
| 40 | + |
| 41 | + |
| 42 | + |
| 43 | +### Outlier Detection |
| 44 | +The HDBSCAN clusterer objects also support the GLOSH outlier detection algorithm. After fitting the clusterer to |
| 45 | +data the outlier scores can be accessed via the `outlierScores_` from the `Hdbscan` Object. The result is a vector of score values, |
| 46 | +one for each data point that was fit. Higher scores represent more outlier like objects. Selecting outliers via upper |
| 47 | +quantiles is often a good approach. |
| 48 | + |
| 49 | +Based on the papers: |
| 50 | +> R.J.G.B. Campello, D. Moulavi, A. Zimek and J. Sander Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection, ACM Trans. on Knowledge Discovery from Data, Vol 10, 1 (July 2015), 1-51. |
| 51 | +
|
| 52 | +## Examples |
| 53 | +``` |
| 54 | +#include<iostream> |
| 55 | +#include"../HDBSCAN-CPP/Hdbscan/hdbscan.hpp" |
| 56 | +using namespace std; |
| 57 | +int main() { |
| 58 | +
|
| 59 | + Hdbscan hdbscan("HDBSCANDataset/FourProminentClusterDataset.csv"); |
| 60 | + hdbscan.loadCsv(2); |
| 61 | + hdbscan.execute(5, 5, "Euclidean"); |
| 62 | + hdbscan.displayResult(); |
| 63 | + cout << "You can access other fields like cluster labels, membership probabilities and outlier scores."<<endl; |
| 64 | +
|
| 65 | + /*Use it like this |
| 66 | + hdbscan.labels_; |
| 67 | + hdbscan.membershipProbabilities_; |
| 68 | + hdbscan.outlierScores_; |
| 69 | + */ |
| 70 | + return 0; |
| 71 | +
|
| 72 | +} |
| 73 | +``` |
| 74 | + |
| 75 | +### Help and Support |
| 76 | +If you have issues, please check the issues on github. Finally, if no solution is available there feel free to open an issue ; |
| 77 | +the authors will attempt to respond. |
| 78 | + |
| 79 | +### Contributing |
| 80 | +We welcome contributions in any form! Assistance with documentation, is always welcome. |
| 81 | +To contribute please fork the project make your changes and submit a pull request. |
| 82 | +We will do our best to work through any issues with you and get your code merged into the master branch. |
0 commit comments