Skip to content

Releases: cognitivefactory/interactive-clustering-comparative-study

1.0.0

13 Nov 17:18

Choose a tag to compare

Interactive Clustering : Comparative Studies

Several comparative studies of cognitivefactory-interactive-clustering functionalities on NLP datasets.

Quick description of Interactive Clustering

Interactive clustering is a method intended to assist in the design of a training data set.

This iterative process begins with an unlabeled dataset, and it uses a sequence of two substeps :

  1. the user defines constraints on data sampled by the machine ;
  2. the machine performs data partitioning using a constrained clustering algorithm.

Thus, at each step of the process :

  • the user corrects the clustering of the previous steps using constraints, and
  • the machine offers a corrected and more relevant data partitioning for the next step.

Description of studies

Several studies are provided here:

  1. efficience: Aims to confirm the technical efficience of the method by verifying its convergence to a ground truth and by finding the best implementation to increase convergence speed.
  2. computation time: Aims to estimate the time needed for algorithms to reach their objectives.
  3. annotation time: Aims to estimate the time needed to annotated constraints.
  4. constraints number: Aims to estimate the number of constraints needed to have a relevant annotated dataset.
  5. relevance: Aims to confirm the relevance of clustering results.
  6. rentability: Aims to predict the rentability of one more iteration.
  7. inter annotator: Aims to estimate the inter-annotators score during constraints annotation.
  8. annotation errors and conflicts fix: Aims to evaluate errors impact and verify conflicts fix importance on labeling.
  9. annotation subjectivity: Aims to estimate the labeling difference impact on clustering results.

Results

All results are zipped in .tar.gz files and versioned on Zenodo: Schild, E. (2021). cognitivefactory/interactive-clustering-comparative-study. Zenodo. https://doi.org/10.5281/zenodo.5648255.

Warning ! These experiments can use a huge disk space and contain hundreds or even thousands of files (1 per execution attempt). See the table below before extracting the files.

STUDY NAME FOLDER SIZE .tar.gz FILE SIZE
1_efficience_study 1.4 Go 0.7 Go
2_computation_time_study 1.1 Go 0.1 Go
3_annotation_time_study 0.1 Go 0.1 Go
4_constraints_number_study 12.0 Go 2.7 Go
5_relevance_study 0.1 Go 0.1 Go
6_rentability_study 1.3 Go 0.1 Go
7_inter_annotators_score_study 0.1 Go 0.1 Go
8_annotation_error_fix_study 28.0 Go 3.5 Go
9_annotation_subjectivity_study 82.0 Go 11.3 Go

Associated PhD report

Schild, E. (2024, in press). De l'Importance de Valoriser l'Expertise Humaine dans l’Annotation : Application à la Modélisation de Textes en Intentions à l'aide d’un Clustering Interactif. Université de Lorraine.

How to cite

Schild, E. (2021). cognitivefactory/interactive-clustering-comparative-study. Zenodo. https://doi.org/10.5281/zenodo.5648255

0.1.0

05 Nov 15:44

Choose a tag to compare

0.1.0 Pre-release
Pre-release

Interactive Clustering : Comparative Studies

Several comparative studies of cognitivefactory-interactive-clustering functionalities on NLP datasets.

Quick description of Interactive Clustering

Interactive clustering is a method intended to assist in the design of a training data set.

This iterative process begins with an unlabeled dataset, and it uses a sequence of two substeps :

  1. the user defines constraints on data sampled by the machine ;
  2. the machine performs data partitioning using a constrained clustering algorithm.

Thus, at each step of the process :

  • the user corrects the clustering of the previous steps using constraints, and
  • the machine offers a corrected and more relevant data partitioning for the next step.

Description of studies

Several studies are provided here:

  1. efficience: Aims to confirm the technical efficience of the method by verifying its convergence to a ground truth and by finding the best implementation to increase convergence speed.

Associated research article

Schild, E., Durantin, G., Lamirel, J., & Miconi, F. (2022). Iterative and Semi-Supervised Design of Chatbots Using Interactive Clustering. International Journal of Data Warehousing and Mining (IJDWM), 18(2), 1-19. http://doi.org/10.4018/IJDWM.298007. <hal-03648041>.

How to cite

Schild, E. (2021). cognitivefactory/interactive-clustering-comparative-study. Zenodo. https://doi.org/10.5281/zenodo.5648255