Facestar Dataset

Description

Existing audio-visual datasets for human speech are either captured in a clean, controlled environment but contain only a small amount of speech data without natural conversations, or are collected in-the-wild with unreliable audio quality, interfering sounds, low face resolution, and unreliable or occluded lip motion.

The Facestar dataset aims to enable research on audio-visual modeling in a large-scale and high-quality setting. Core dataset features:

10 hours of high-quality audio-visual speech data
audio recordings in a quiet environment at 16kHz
video of resolution 1300 x 1600 at 60 frames per second
one female and one male speaker
natural speech: all data is conversational speech in a video-conferencing setup
full face visibility: speakers are facing the camera while talking

See the paper for more details. If you use the dataset, please cite

@inproceedings{yang2022audiovisual,
  title={Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis},
  author={Yang, Karren and Markovic, Dejan and Krenn, Steven and Agrawal, Vasu and Richard, Alexander},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2022}
}

Download

The dataset is partitioned into a trainset and a testset for each speaker. Within each partition, there are several sessions, each of which is further subdivided into cuts of about 30 seconds. For each session, the videos are provided without sound as sessionXX_cutYY.mp4 and the corresponding audio is provided in the wave file sessionXX_cutYY.wav.

Automatic Download

The download.py script automatically downloads the complete dataset and unzips the sessions. Needs to be run with python3. The complete dataset is about 30GB in size.

Manual Download

If you do not need the full dataset but only selected sessions, you can download them manually:

female speaker trainset (55 sessions)
female speaker testset (7 sessions)
male speaker trainset (45 sessions)
male speaker testset (5 sessions)

License

The dataset is released under CC-NC 4.0 International license.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
download.py		download.py
teaser.jpg		teaser.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CODE_OF_CONDUCT.md

CODE_OF_CONDUCT.md

CONTRIBUTING.md

CONTRIBUTING.md

LICENSE

LICENSE

README.md

README.md

download.py

download.py

teaser.jpg

teaser.jpg

Repository files navigation

Facestar Dataset

Description

Download

Automatic Download

Manual Download

License

About

Releases 5

Packages

Languages

License

facebookresearch/facestar

Folders and files

Latest commit

History

Repository files navigation

Facestar Dataset

Description

Download

Automatic Download

Manual Download

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages