Skip to content

ddw2AIGROUP2CQUPT/Large-Scale-Multimodal-Face-Datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 

Repository files navigation

Large-Scale Multimodal Face Datasets

You need to download the data from Huggingface first, and then apply for the original Laion-face data by completing the agreement. Based on the information from Huggingface, obtain the corresponding image-text pairs.

[25/06/09] 🤗The Original Images, are Released [Agreement]

[24/07/05] 🤗FacaCaption-15M OpenFace-CQUPT/FaceCaption-15M

[25/01/11] 🤗FaceCaptionHQ-4M OpenFace-CQUPT/FaceCaptionHQ-4M

[24/09/12] 🤗HumanCaption-10M OpenFace-CQUPT/HumanCaption-10M

[24/10/23] 🤗HumanCaption-HQ OpenFace-CQUPT/HumanCaption-HQ-311K

FacaCaption-15M

image/png

FaceCaption-15M, a large-scale, diverse, and high-quality dataset of facial images accompanied by their natural language descriptions (facial image-to-text). This dataset aims to facilitate a study on face-centered tasks. FaceCaption-15M comprises over 15 million pairs of facial images and their corresponding natural language descriptions of facial features, making it the largest facial image caption dataset to date.
[24/09/01] The embeddings of images in FaceCaption-15M has been released! OpenFace-CQUPT/Facecaption-15M-Embeddings

FaceCaptionHQ-4M

image/png

FaceCaptionHQ-4M contains about 4M facial image-text pairs that cleaned from FaceCaption-15M .

HumanCaption-10M

image/png

HumanCaption-10M: a large, diverse, high-quality dataset of human-related images with natural language descriptions (image to text). The dataset is designed to facilitate research on human-centered tasks. HumanCaption-10M contains approximately 10 million human-related images and their corresponding facial features in natural language descriptions and is the second generation version of FaceCaption-15M

HumanCaption-HQ

image/png

Approximately 311,000 human-related images and their corresponding natural language descriptions. Compared to HumanCaption-10M, this dataset not only includes associated facial language descriptions but also filters out images with higher resolution and employs the powerful visual understanding capabilities of GPT-4V to generate more detailed and accurate text descriptions. This dataset is used for the second phase of training HumanVLM, enhancing the model's capabilities in caption generation and visual understanding.

Citation

@misc{dai202415mmultimodalfacialimagetext,
      title={15M Multimodal Facial Image-Text Dataset}, 
      author={Dawei Dai and YuTang Li and YingGe Liu and Mingming Jia and Zhang YuanHui and Guoyin Wang},
      year={2024},
      eprint={2407.08515},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.08515}, 
}

About

Millions-Level Face/Human-Scene Image-Text Datasets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published