Large-Scale Multimodal Face Datasets

You need to download the data from Huggingface first, and then apply for the original Laion-face data by completing the agreement. Based on the information from Huggingface, obtain the corresponding image-text pairs.

[25/06/09] 🤗The Original Images, are Released [Agreement]

[24/07/05] 🤗FacaCaption-15M OpenFace-CQUPT/FaceCaption-15M

[25/01/11] 🤗FaceCaptionHQ-4M OpenFace-CQUPT/FaceCaptionHQ-4M

[24/09/12] 🤗HumanCaption-10M OpenFace-CQUPT/HumanCaption-10M

[24/10/23] 🤗HumanCaption-HQ OpenFace-CQUPT/HumanCaption-HQ-311K

FacaCaption-15M

FaceCaption-15M, a large-scale, diverse, and high-quality dataset of facial images accompanied by their natural language descriptions (facial image-to-text). This dataset aims to facilitate a study on face-centered tasks. FaceCaption-15M comprises over 15 million pairs of facial images and their corresponding natural language descriptions of facial features, making it the largest facial image caption dataset to date.
[24/09/01] The embeddings of images in FaceCaption-15M has been released! OpenFace-CQUPT/Facecaption-15M-Embeddings

FaceCaptionHQ-4M

FaceCaptionHQ-4M contains about 4M facial image-text pairs that cleaned from FaceCaption-15M .

HumanCaption-10M

HumanCaption-10M: a large, diverse, high-quality dataset of human-related images with natural language descriptions (image to text). The dataset is designed to facilitate research on human-centered tasks. HumanCaption-10M contains approximately 10 million human-related images and their corresponding facial features in natural language descriptions and is the second generation version of FaceCaption-15M

HumanCaption-HQ

Approximately 311,000 human-related images and their corresponding natural language descriptions. Compared to HumanCaption-10M, this dataset not only includes associated facial language descriptions but also filters out images with higher resolution and employs the powerful visual understanding capabilities of GPT-4V to generate more detailed and accurate text descriptions. This dataset is used for the second phase of training HumanVLM, enhancing the model's capabilities in caption generation and visual understanding.

Citation

@misc{dai202415mmultimodalfacialimagetext,
      title={15M Multimodal Facial Image-Text Dataset}, 
      author={Dawei Dai and YuTang Li and YingGe Liu and Mingming Jia and Zhang YuanHui and Guoyin Wang},
      year={2024},
      eprint={2407.08515},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.08515}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Agreement.pdf		Agreement.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Large-Scale Multimodal Face Datasets

FacaCaption-15M

FaceCaptionHQ-4M

HumanCaption-10M

HumanCaption-HQ

Citation

About

Uh oh!

Releases

Packages

ddw2AIGROUP2CQUPT/Large-Scale-Multimodal-Face-Datasets

Folders and files

Latest commit

History

Repository files navigation

Large-Scale Multimodal Face Datasets

FacaCaption-15M

FaceCaptionHQ-4M

HumanCaption-10M

HumanCaption-HQ

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages