Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Overview

Face Recognition: Too Bias, or Not Too Bias?

Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition: too bias, or not too bias? " In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0-1. 2020.
@inproceedings{robinson2020face,
               title={Face recognition: too bias, or not too bias?},
               author={Robinson, Joseph P and Livitz, Gennady and Henon, Yann and Qin, Can and Fu, Yun and Timoner, Samson},
               booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
               pages={0--1},
               year={2020}
             }
    

Robinson, Joseph P., Can Qin, Yann Henon, Samson Timoner, and Yun Fu. "Balancing Biases and Preserving Privacy on Balanced Faces in the Wild." In CoRR arXiv:2103.09118, (2021).
@article{robinson2021balancing,
        title={Balancing Biases and Preserving Privacy on Balanced Faces in the Wild},
        author={Robinson, Joseph P and Qin, Can and Henon, Yann and Timoner, Samson and Fu, Yun},
        journal={arXiv preprint arXiv:2103.09118},
        year={2021}
       }
    

Teaser

Balanced Faces in the Wild (BFW): Data, Code, Evaluations

version: 0.4.5 (following Semantic Versioning Scheme-- learn more here, https://semver.org)

Intended to address problems of bias in facial recognition, we built BFW as a labeled data resource made available for evaluating recognition systems on a corpus of facial imagery made-up of EQUAL face count for all subjects: EQUAL across demographics, and, thus, face data balanced in faces per subject, individuals per ethnicity, and ethnicities per gender or vise versa.

Data can be accessed via Google form or Microsft form. Do not hesitate to report an issue for any and all inquiries.

Project Overview

This project investigates bias in automatic facial recognition (FR). Specifically, subjects are grouped into predefined subgroups based on gender, ethnicity, and soon-to-be age. For this, we propose a novel image collection called Balanced Faces in the Wild (BFW), which is balanced across eight subgroups (i.e., 800 face images of 100 subjects, each with 25 face samples). Thus, along with the name (i.e., identification) labels and task protocols (e.g., list of pairs for face verification, pre-packaged data-table with additional metadata and labels, etc.), BFW clearly groups into ethnicities (i.e., Asian (A), Black (B), Indian (I), and White (W)) and genders (i.e., Females (F) and Males (M)). Thus, the motivation and intent are that BFW will provide a proxy to characterize FR systems with demographic-specific analysis now possible. For instance, various confusion metrics, along with the predefined criteria (i.e., score threshold), are fundamental when characterizing performance ratings of FR systems. The following visualization summarizes the confusion metrics in a way that relates to the different measurements.

metrics

As discussed, the motivation for designing, building, and releasing BFW for research purposes has been discussed. We expect the data, all-in-all, will continue to evolve. Nonetheless, as is, there are vast options on ways to advance technology and our understanding thereof. Let us now focus on the contents of the repo (i.e., code-base) for which was created to support the data of BFW (i.e., data proxy), making all experiments in paper easily reproducible and, thus, the work more friendly for getting started.

Experimental-based contributions and findings

Several observations were made that widened our understanding of bias in FR. Views were demonstrated experimentally, with all code used in experiments added as a part of this repo.

Score sensitivity

For instance, it is shown that the scoring sensitivity within different subgroups verifies. That is, faces of the same identity tend to shift in expected values (e.g., given a correct pair of Black faces, on average, have similarity scores smaller than a true pair of White, and the middle range of scores for Males compared to Females). This is demonstrated using fundamental signal detection models (SDM), along with detection error trade-off (DET) curves.

Global threshold

Once an FR system is deployed, a criterion (i.e., threshold) is set (or tunable) such that similarity scores that do not pass are assumed false matches and are filtered out of the candidate pool for potential true pairs. In other words, thresholds act as decision boundaries that map scores (or distances) to nominal values such as genuine or imposter. Considering the variable sensitivity found prior, intuition tells us that a variable threshold is optimal. Thus, returning to the fundamental concepts of signal detection theory, we show that using a single, global threshold yields skewed performance ratings across different subgroups. For this, we demonstrate that subgroup-specific thresholds are optimal in terms of overall performance and balance across subgroups.

All-in-all

All of this and more (i.e., evaluation and analysis of FR systems on BFW data, along with data structures and implementation schemes optimized for the problems at hand, are included in modules making up the project and demonstrated in notebook tutorials). We will continue to add tools for a fair analysis of FR systems. Thus, not only the experiments but also the data we expect to grow. All contributions are not only welcome but are entirely encouraged.

Here are quick links to key aspects of this resource.

Register and download via this form.

Final note. Thee repo is a work-in-progress. Certainly, it is ready to be cloned and used; however, expect regular improvements, both in the implementation and documentation (i.e., getting started instructions will be enhanced). For now, it is recommended to begin with README files listed just above, along with the tutorial notebooks found in code-> notebooks with brief descriptions in README and more detail inline of each notebook. Again, PRs are more than welcome :)

Paper abstract

We reveal critical insights into bias problems in state-of-the-art facial recognition (FR) systems using a novel Balanced Faces In the Wild (BFW) dataset: data balanced for gender and ethnic groups. We show variations in the optimal scoring threshold for face pairs across different subgroups. Thus, the conventional approach of learning a global threshold for all pairs results in performance gaps between subgroups. By learning subgroup-specific thresholds, we reduce performance gaps and show a notable boost in overall performance. Furthermore, we do a human evaluation to measure human bias, which supports the hypothesis that an analogous bias exists in human perception. For the BFW database, source code, and more, visit https://github.com/visionjo/facerec-bias-bfw.

To Do

  • Begin Template
  • Create demo notebooks
  • Add manuscript
  • Documentation (sphinx)
  • Update README (this)
  • Pre-commit, formatter (Black) and .gitignore
  • Complete test harness
  • Modulate (refactor) code
  • Complete datatable (i.e., extend pandas.DataFrame)
  • Add scripts and CLI

License

All source code is made available under a BSD 3-clause license. You can freely use and modify the code without warranty, so long as you provide attribution to the authors. See LICENSE.md (LICENSE) for the full license text.

The manuscript text is not open source. The authors reserve the rights to the article content, which is currently submitted for publication in the 2020 IEEE Conference on AMFG.

Acknowledgement

We would like to thank the PINGA organization on Github for the project template used to structure this project.

Comments
  • About the MTCNN face detections and preprocessing

    About the MTCNN face detections and preprocessing

    Hi,

    It would be great if you could clarify a few questions regarding this dataset please.

    1. Is it possible for you to provide the MTCNN output face detections (bounding boxes and facial landmarks) for the face samples in BFW?

    2. Am I right in assuming MTCNN takes as input the images in "face-samples" folder of the dataset? If yes, what settings do we use with MTCNN in order for us to detect a face correctly on all the facial images provided in face-samples? If not, can you help us reproduce your face detection results by providing us with the original images on which the MTCNN was run to obtain the results in face-samples?

    3. Are the images in facial-samples actually crops which are aligned?

    Thanks in advance for your help.

    opened by manisoftwartist 2
  • Create a wrapper function to unify pipeline that produces the 3 figures (detailed below) from embedding data

    Create a wrapper function to unify pipeline that produces the 3 figures (detailed below) from embedding data

    3 Figures based on the paper "Face Recognition: Too Bias, or Not Too Bias" are

    1. DET curves: FPR versus FNR by moving threshold
    2. Score distributions for genuine and imposter using violin plots
    3. Confusion matrix for Rank 1 and any Rank.
    opened by suchanv 2
  • Devel

    Devel

    Develop branch-- prepare for next version release.

    Aim for the following for version 0.1.1:

    • [ ] Notebook is updated to use interface recently modulated (#21)
    • [ ] Update Documentation to explain steps to run (#21)
      • [ ] add to README in root
      • [x] move results from README in root to the README in results/
      • [x] move data section from README in root to the README in data/
        • [x] save curve data along with PDF (i.e., in results/)
      • [ ] Add simple (brief) docstring where missing)
      • [ ] A sample (toy) set is run end-to-end (demonstrate in README)
        • [ ] if small enough, add to repo (i.e., < 40 MB or so)
    • [ ] Finish script to generate Tar @ Far table
    • [ ] Improve annotation in notebooks; more description, i.e., tutorial-like.
    • [ ] create pdf versions of notebooks and add to project in notebooks/pdfs (or create nbviewer and point to it)
    • [ ] add assertions (and tests) where appropriate-- at least critical cases, such a specific type is expected.
    • [ ] Consider moving some of the analysis functions to visualizations.
      • [ ] modulate the handling of plt.axes objects
      • [ ] add optional input arguments for the title and other figure cosmetics or settings
    • [x] Add benchmarks for sphereface features. Make these the results showcased throughout.
    documentation enhancement Benchmark Project-level 
    opened by visionjo 1
  • Questions on verification_RFW and training procedure

    Questions on verification_RFW and training procedure

    Hi, Thanks for your great work and sharing of the code on these two papers ! It takes me days to read the paper and go through the repository and I have a few questions:

    (2) Do you have the code for training the features (asian_females, asian_males, black_females, black_males, indian_females, indian_males,...). Since I have a hard time finding something like train.py (e.g. the loss function and training process). (I suppose the released code is mainly on image pre-processing and result analysis) (Since BFW dataset is not as large as other face dataset and it may possible for me to train it from scratch on one GPU)

    (3) I am little confused about how the BFW is used in two papers, as I understand:

    in paper Face Recognition: Too Bias, or Not Too Bias? , the train and test model are as follows: train: CASIA_webface trained using Sphereface loss test: LFW where does BFW dataset not used in training in this set of experiments?

    in paper Balancing Biases and Preserving Privacy on Balanced Faces in the Wild the train, test model are as follows: tain: (1) MS1M trained using Arcface loss --> to get 512-dim embedding (f_in in Fig.6) (2) BFW dataset is used to train the encoder and two classifiers in Fig 6 test: 4-folds used for training and 1-fold used for testing (using the best threshold chosen)

    is that right?

    (4) There are some difference from "bfw-v0.1.5-datatable.csv" and the TABLE-2 in paper 2: for example: there are 921379 records in TABLE-2 while ther are 923898 records from the csv file? and there is no "{dir_meta}thresholds.pkl" file.

    Thanks for your time and any help would be appreciated !

    opened by lizhenstat 7
  • Regarding face identification

    Regarding face identification

    Hey,

    Thanks for the awesome work!

    I wanted to know how I can modify the repo to use for face identification task instead of verification.

    Any help would be highly appreciated.

    opened by shivmgg 1
  • Sphinx documentation

    Sphinx documentation

    Setup the project for sphinx.

    Include clear instruction on how to maintain (i.e., once in place, we'll include as part of the build process (see in docs/)

    Setup for tutorials on the different concepts and experiments done as part of this line of work (i.e., facial bias and BFW database)

    documentation enhancement 
    opened by visionjo 0
  • Create plan for Dash interface

    Create plan for Dash interface

    Project plan (lead: Dylan; support: Rohan):

    • [ ] what features to include
    • [ ] Specifications
    • [ ] Interface layout (use lucidchart or equivalent)
    • [ ] Division of tasks and proposed timeline
    Plan and design Project-level 
    opened by visionjo 0
Releases(v0.0.3)
Owner
Joseph P. Robinson
Ph.D., Northeastern, 2020. Focus: applied machine learning, mostly vision. At Vicarious Surgical's ASDAI group, an AI Engineer working on our surgical robot.
Joseph P. Robinson
This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model inference.

PyTorch Infer Utils This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model infer

Alex Gorodnitskiy 11 Mar 20, 2022
A multi-entity Transformer for multi-agent spatiotemporal modeling.

baller2vec This is the repository for the paper: Michael A. Alcorn and Anh Nguyen. baller2vec: A Multi-Entity Transformer For Multi-Agent Spatiotempor

Michael A. Alcorn 56 Nov 15, 2022
Implementation of ResMLP, an all MLP solution to image classification, in Pytorch

ResMLP - Pytorch Implementation of ResMLP, an all MLP solution to image classification out of Facebook AI, in Pytorch Install $ pip install res-mlp-py

Phil Wang 178 Dec 02, 2022
A library for hidden semi-Markov models with explicit durations

hsmmlearn hsmmlearn is a library for unsupervised learning of hidden semi-Markov models with explicit durations. It is a port of the hsmm package for

Joris Vankerschaver 69 Dec 20, 2022
Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021.

NL-CSNet-Pytorch Pytorch code for paper "Image Compressed Sensing Using Non-local Neural Network" TMM 2021. Note: this repo only shows the strategy of

WenxueCui 7 Nov 07, 2022
Automatic learning-rate scheduler

AutoLRS This is the PyTorch code implementation for the paper AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly published

Yuchen Jin 33 Nov 18, 2022
Source code for the BMVC-2021 paper "SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation".

SimReg: A Simple Regression Based Framework for Self-supervised Knowledge Distillation Source code for the paper "SimReg: Regression as a Simple Yet E

9 Oct 15, 2022
Repository for MeshTalk supplemental material and code once the (already approved) 16 GHS captures our lab will make publicly available are released.

meshtalk This repository contains code to run MeshTalk for face animation from audio. If you use MeshTalk, please cite @inproceedings{richard2021mesht

Meta Research 221 Jan 06, 2023
以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的斗地主ai

ddz-ai 介绍 斗地主是一种扑克游戏。游戏最少由3个玩家进行,用一副54张牌(连鬼牌),其中一方为地主,其余两家为另一方,双方对战,先出完牌的一方获胜。 ddz-ai以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的系统,使其经过大量训练后,能在实际游戏中获

freefuiiismyname 88 May 15, 2022
VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning

    VarCLR: Variable Representation Pre-training via Contrastive Learning New: Paper accepted by ICSE 2022. Preprint at arXiv! This repository contain

squaresLab 32 Oct 24, 2022
Recommendationsystem - Movie-recommendation - matrixfactorization colloborative filtering recommendation system user

recommendationsystem matrixfactorization colloborative filtering recommendation

kunal jagdish madavi 1 Jan 01, 2022
An open-source online reverse dictionary.

An open-source online reverse dictionary.

THUNLP 6.3k Jan 09, 2023
IGCN : Image-to-graph convolutional network

IGCN : Image-to-graph convolutional network IGCN is a learning framework for 2D/3D deformable model registration and alignment, and shape reconstructi

Megumi Nakao 7 Oct 27, 2022
Official implementation of the NRNS paper: No RL, No Simulation: Learning to Navigate without Navigating

No RL No Simulation (NRNS) Official implementation of the NRNS paper: No RL, No Simulation: Learning to Navigate without Navigating NRNS is a heriarch

Meera Hahn 20 Nov 29, 2022
A particular navigation route using satellite feed and can help in toll operations & traffic managemen

How about adding some info that can quanitfy the stress on a particular navigation route using satellite feed and can help in toll operations & traffic management The current analysis is on the satel

Ashish Pandey 1 Feb 14, 2022
Code for reproducing experiments in "Improved Training of Wasserstein GANs"

Improved Training of Wasserstein GANs Code for reproducing experiments in "Improved Training of Wasserstein GANs". Prerequisites Python, NumPy, Tensor

Ishaan Gulrajani 2.2k Jan 01, 2023
Image to Image translation, image generataton, few shot learning

Semi-supervised Learning for Few-shot Image-to-Image Translation [paper] Abstract: In the last few years, unpaired image-to-image translation has witn

yaxingwang 49 Nov 18, 2022
PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).

PFENet This is the implementation of our paper PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation that has been accepted to IEE

DV Lab 230 Dec 31, 2022
This is the repo for the paper "Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement".

Improving the Accuracy-Memory Trade-Off of Random Forests Via Leaf-Refinement This is the repository for the paper "Improving the Accuracy-Memory Trad

3 Dec 29, 2022
[CVPR 2021] Generative Hierarchical Features from Synthesizing Images

[CVPR 2021] Generative Hierarchical Features from Synthesizing Images

GenForce: May Generative Force Be with You 148 Dec 09, 2022