Skip to content

Inquiry About Adding Compression Tree and EOM in C++ HDBSCAN #15

@YUANMENG-1

Description

@YUANMENG-1
  • I am currently working on a project where I need to perform HDBSCAN clustering on a large dataset (millions of iterations with 600-700 data points per run). Initially, I was using Python's HDBSCAN implementation, but due to performance issues, I tried GPU HDBSCAN and Fast HDBSCAN. However, neither provided satisfactory results in terms of speed and efficiency. As a result, I decided to switch to the C++ version of HDBSCAN.

  • While using the C++ version, I noticed that the number of noise points generated is significantly lower than with Python's HDBSCAN. This led me to wonder why the compression tree and EOM (Exponential of Minimum) methods, which are available in Python's HDBSCAN implementation, are not included in the C++ version.

  • I would like to ask whether it would be possible to incorporate a similar compression tree and EOM approach in the C++ implementation. My primary concern is whether adding these features, similar to Python's HDBSCAN, would significantly slow down the speed of the C++ implementation.

  • Your insights and advice on this matter would be greatly appreciated.

Thank you for your time and consideration.

Best regards,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions