FMFCC-A

This project is the description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

The FMFCC-A dataset is shared through BaiduCloud (website: https://pan.baidu.com/s/1CGPkC8VfjXVBZjluEHsW6g , password: IIES). The FMFCC-A dataset is by far the largest publicly available Mandarin dataset for synthetic speech detection, which contains 40,000 synthesized Mandarin utterances that generated by 11 Mandarin TTS systems and two Mandarin VC systems, and 10,000 genuine Mandarin utterances collected from 58 speakers. In addition, the official website of FMFCC-A (Audio track of the first fake media forensic challenge of China Society of Image and Graphics) is http://fmfcc.net/ . We hope that the FMFCC-A dataset can fill the gap of lack of Mandarin datasets for synthetic speech detection under various audio post-processing operations.

If you find the code or dataset is usefull, please cite the following papers: FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection

The description of FMFCC-A (audio track of FMFCC) dataset and Challenge resluts.

Related tags

Overview

FMFCC-A

Owner

The sixth place winning solution (6/220) in 2021 Gaofen Challenge.

[CVPR 2022] Unsupervised Image-to-Image Translation with Generative Prior

A concise but complete implementation of CLIP with various experimental improvements from recent papers

Official Tensorflow implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation (ICLR 2020)

SwinTrack: A Simple and Strong Baseline for Transformer Tracking

Face Recognize System on camera AI OAK1

Codes for CyGen, the novel generative modeling framework proposed in "On the Generative Utility of Cyclic Conditionals" (NeurIPS-21)

U-Net for GBM

Deep Latent Force Models

DeiT: Data-efficient Image Transformers

The aim of the game, as in the original one, is to find a specific image from a group of different images of a person's face

Spearmint Bayesian optimization codebase

On-device speech-to-index engine powered by deep learning.

This is the code used in the paper "Entity Embeddings of Categorical Variables".

CVPR2021 Workshop - HDRUNet: Single Image HDR Reconstruction with Denoising and Dequantization.

ReLoss - Official implementation for paper "Relational Surrogate Loss Learning" ICLR 2022

Efficient Speech Processing Tookit for Automatic Speaker Recognition

Implementation of "Learning Multi-Granular Hypergraphs for Video-Based Person Re-Identification"

🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools

A Sign Language detection project using Mediapipe landmark detection and Tensorflow LSTM's