Source code and notebooks to reproduce experiments and benchmarks on Bias Faces in the Wild (BFW).

Overview

Face Recognition: Too Bias, or Not Too Bias?

Robinson, Joseph P., Gennady Livitz, Yann Henon, Can Qin, Yun Fu, and Samson Timoner. "Face recognition: too bias, or not too bias? " In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0-1. 2020.
@inproceedings{robinson2020face,
               title={Face recognition: too bias, or not too bias?},
               author={Robinson, Joseph P and Livitz, Gennady and Henon, Yann and Qin, Can and Fu, Yun and Timoner, Samson},
               booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
               pages={0--1},
               year={2020}
             }
    

Robinson, Joseph P., Can Qin, Yann Henon, Samson Timoner, and Yun Fu. "Balancing Biases and Preserving Privacy on Balanced Faces in the Wild." In CoRR arXiv:2103.09118, (2021).
@article{robinson2021balancing,
        title={Balancing Biases and Preserving Privacy on Balanced Faces in the Wild},
        author={Robinson, Joseph P and Qin, Can and Henon, Yann and Timoner, Samson and Fu, Yun},
        journal={arXiv preprint arXiv:2103.09118},
        year={2021}
       }
    

Teaser

Balanced Faces in the Wild (BFW): Data, Code, Evaluations

version: 0.4.5 (following Semantic Versioning Scheme-- learn more here, https://semver.org)

Intended to address problems of bias in facial recognition, we built BFW as a labeled data resource made available for evaluating recognition systems on a corpus of facial imagery made-up of EQUAL face count for all subjects: EQUAL across demographics, and, thus, face data balanced in faces per subject, individuals per ethnicity, and ethnicities per gender or vise versa.

Data can be accessed via Google form or Microsft form. Do not hesitate to report an issue for any and all inquiries.

Project Overview

This project investigates bias in automatic facial recognition (FR). Specifically, subjects are grouped into predefined subgroups based on gender, ethnicity, and soon-to-be age. For this, we propose a novel image collection called Balanced Faces in the Wild (BFW), which is balanced across eight subgroups (i.e., 800 face images of 100 subjects, each with 25 face samples). Thus, along with the name (i.e., identification) labels and task protocols (e.g., list of pairs for face verification, pre-packaged data-table with additional metadata and labels, etc.), BFW clearly groups into ethnicities (i.e., Asian (A), Black (B), Indian (I), and White (W)) and genders (i.e., Females (F) and Males (M)). Thus, the motivation and intent are that BFW will provide a proxy to characterize FR systems with demographic-specific analysis now possible. For instance, various confusion metrics, along with the predefined criteria (i.e., score threshold), are fundamental when characterizing performance ratings of FR systems. The following visualization summarizes the confusion metrics in a way that relates to the different measurements.

metrics

As discussed, the motivation for designing, building, and releasing BFW for research purposes has been discussed. We expect the data, all-in-all, will continue to evolve. Nonetheless, as is, there are vast options on ways to advance technology and our understanding thereof. Let us now focus on the contents of the repo (i.e., code-base) for which was created to support the data of BFW (i.e., data proxy), making all experiments in paper easily reproducible and, thus, the work more friendly for getting started.

Experimental-based contributions and findings

Several observations were made that widened our understanding of bias in FR. Views were demonstrated experimentally, with all code used in experiments added as a part of this repo.

Score sensitivity

For instance, it is shown that the scoring sensitivity within different subgroups verifies. That is, faces of the same identity tend to shift in expected values (e.g., given a correct pair of Black faces, on average, have similarity scores smaller than a true pair of White, and the middle range of scores for Males compared to Females). This is demonstrated using fundamental signal detection models (SDM), along with detection error trade-off (DET) curves.

Global threshold

Once an FR system is deployed, a criterion (i.e., threshold) is set (or tunable) such that similarity scores that do not pass are assumed false matches and are filtered out of the candidate pool for potential true pairs. In other words, thresholds act as decision boundaries that map scores (or distances) to nominal values such as genuine or imposter. Considering the variable sensitivity found prior, intuition tells us that a variable threshold is optimal. Thus, returning to the fundamental concepts of signal detection theory, we show that using a single, global threshold yields skewed performance ratings across different subgroups. For this, we demonstrate that subgroup-specific thresholds are optimal in terms of overall performance and balance across subgroups.

All-in-all

All of this and more (i.e., evaluation and analysis of FR systems on BFW data, along with data structures and implementation schemes optimized for the problems at hand, are included in modules making up the project and demonstrated in notebook tutorials). We will continue to add tools for a fair analysis of FR systems. Thus, not only the experiments but also the data we expect to grow. All contributions are not only welcome but are entirely encouraged.

Here are quick links to key aspects of this resource.

Register and download via this form.

Final note. Thee repo is a work-in-progress. Certainly, it is ready to be cloned and used; however, expect regular improvements, both in the implementation and documentation (i.e., getting started instructions will be enhanced). For now, it is recommended to begin with README files listed just above, along with the tutorial notebooks found in code-> notebooks with brief descriptions in README and more detail inline of each notebook. Again, PRs are more than welcome :)

Paper abstract

We reveal critical insights into bias problems in state-of-the-art facial recognition (FR) systems using a novel Balanced Faces In the Wild (BFW) dataset: data balanced for gender and ethnic groups. We show variations in the optimal scoring threshold for face pairs across different subgroups. Thus, the conventional approach of learning a global threshold for all pairs results in performance gaps between subgroups. By learning subgroup-specific thresholds, we reduce performance gaps and show a notable boost in overall performance. Furthermore, we do a human evaluation to measure human bias, which supports the hypothesis that an analogous bias exists in human perception. For the BFW database, source code, and more, visit https://github.com/visionjo/facerec-bias-bfw.

To Do

  • Begin Template
  • Create demo notebooks
  • Add manuscript
  • Documentation (sphinx)
  • Update README (this)
  • Pre-commit, formatter (Black) and .gitignore
  • Complete test harness
  • Modulate (refactor) code
  • Complete datatable (i.e., extend pandas.DataFrame)
  • Add scripts and CLI

License

All source code is made available under a BSD 3-clause license. You can freely use and modify the code without warranty, so long as you provide attribution to the authors. See LICENSE.md (LICENSE) for the full license text.

The manuscript text is not open source. The authors reserve the rights to the article content, which is currently submitted for publication in the 2020 IEEE Conference on AMFG.

Acknowledgement

We would like to thank the PINGA organization on Github for the project template used to structure this project.

Comments
  • About the MTCNN face detections and preprocessing

    About the MTCNN face detections and preprocessing

    Hi,

    It would be great if you could clarify a few questions regarding this dataset please.

    1. Is it possible for you to provide the MTCNN output face detections (bounding boxes and facial landmarks) for the face samples in BFW?

    2. Am I right in assuming MTCNN takes as input the images in "face-samples" folder of the dataset? If yes, what settings do we use with MTCNN in order for us to detect a face correctly on all the facial images provided in face-samples? If not, can you help us reproduce your face detection results by providing us with the original images on which the MTCNN was run to obtain the results in face-samples?

    3. Are the images in facial-samples actually crops which are aligned?

    Thanks in advance for your help.

    opened by manisoftwartist 2
  • Create a wrapper function to unify pipeline that produces the 3 figures (detailed below) from embedding data

    Create a wrapper function to unify pipeline that produces the 3 figures (detailed below) from embedding data

    3 Figures based on the paper "Face Recognition: Too Bias, or Not Too Bias" are

    1. DET curves: FPR versus FNR by moving threshold
    2. Score distributions for genuine and imposter using violin plots
    3. Confusion matrix for Rank 1 and any Rank.
    opened by suchanv 2
  • Devel

    Devel

    Develop branch-- prepare for next version release.

    Aim for the following for version 0.1.1:

    • [ ] Notebook is updated to use interface recently modulated (#21)
    • [ ] Update Documentation to explain steps to run (#21)
      • [ ] add to README in root
      • [x] move results from README in root to the README in results/
      • [x] move data section from README in root to the README in data/
        • [x] save curve data along with PDF (i.e., in results/)
      • [ ] Add simple (brief) docstring where missing)
      • [ ] A sample (toy) set is run end-to-end (demonstrate in README)
        • [ ] if small enough, add to repo (i.e., < 40 MB or so)
    • [ ] Finish script to generate Tar @ Far table
    • [ ] Improve annotation in notebooks; more description, i.e., tutorial-like.
    • [ ] create pdf versions of notebooks and add to project in notebooks/pdfs (or create nbviewer and point to it)
    • [ ] add assertions (and tests) where appropriate-- at least critical cases, such a specific type is expected.
    • [ ] Consider moving some of the analysis functions to visualizations.
      • [ ] modulate the handling of plt.axes objects
      • [ ] add optional input arguments for the title and other figure cosmetics or settings
    • [x] Add benchmarks for sphereface features. Make these the results showcased throughout.
    documentation enhancement Benchmark Project-level 
    opened by visionjo 1
  • Questions on verification_RFW and training procedure

    Questions on verification_RFW and training procedure

    Hi, Thanks for your great work and sharing of the code on these two papers ! It takes me days to read the paper and go through the repository and I have a few questions:

    (2) Do you have the code for training the features (asian_females, asian_males, black_females, black_males, indian_females, indian_males,...). Since I have a hard time finding something like train.py (e.g. the loss function and training process). (I suppose the released code is mainly on image pre-processing and result analysis) (Since BFW dataset is not as large as other face dataset and it may possible for me to train it from scratch on one GPU)

    (3) I am little confused about how the BFW is used in two papers, as I understand:

    in paper Face Recognition: Too Bias, or Not Too Bias? , the train and test model are as follows: train: CASIA_webface trained using Sphereface loss test: LFW where does BFW dataset not used in training in this set of experiments?

    in paper Balancing Biases and Preserving Privacy on Balanced Faces in the Wild the train, test model are as follows: tain: (1) MS1M trained using Arcface loss --> to get 512-dim embedding (f_in in Fig.6) (2) BFW dataset is used to train the encoder and two classifiers in Fig 6 test: 4-folds used for training and 1-fold used for testing (using the best threshold chosen)

    is that right?

    (4) There are some difference from "bfw-v0.1.5-datatable.csv" and the TABLE-2 in paper 2: for example: there are 921379 records in TABLE-2 while ther are 923898 records from the csv file? and there is no "{dir_meta}thresholds.pkl" file.

    Thanks for your time and any help would be appreciated !

    opened by lizhenstat 7
  • Regarding face identification

    Regarding face identification

    Hey,

    Thanks for the awesome work!

    I wanted to know how I can modify the repo to use for face identification task instead of verification.

    Any help would be highly appreciated.

    opened by shivmgg 1
  • Sphinx documentation

    Sphinx documentation

    Setup the project for sphinx.

    Include clear instruction on how to maintain (i.e., once in place, we'll include as part of the build process (see in docs/)

    Setup for tutorials on the different concepts and experiments done as part of this line of work (i.e., facial bias and BFW database)

    documentation enhancement 
    opened by visionjo 0
  • Create plan for Dash interface

    Create plan for Dash interface

    Project plan (lead: Dylan; support: Rohan):

    • [ ] what features to include
    • [ ] Specifications
    • [ ] Interface layout (use lucidchart or equivalent)
    • [ ] Division of tasks and proposed timeline
    Plan and design Project-level 
    opened by visionjo 0
Releases(v0.0.3)
Owner
Joseph P. Robinson
Ph.D., Northeastern, 2020. Focus: applied machine learning, mostly vision. At Vicarious Surgical's ASDAI group, an AI Engineer working on our surgical robot.
Joseph P. Robinson
Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

Worktory is a python library created with the single purpose of simplifying the inventory management of network automation scripts.

Renato Almeida de Oliveira 18 Aug 31, 2022
Inhomogeneous Social Recommendation with Hypergraph Convolutional Networks

Inhomogeneous Social Recommendation with Hypergraph Convolutional Networks This is our Pytorch implementation for the paper: Zirui Zhu, Chen Gao, Xu C

Zirui Zhu 3 Dec 30, 2022
Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021

LoFTR: Detector-Free Local Feature Matching with Transformers Project Page | Paper LoFTR: Detector-Free Local Feature Matching with Transformers Jiami

ZJU3DV 1.4k Jan 04, 2023
This is a Pytorch implementation of paper: DropEdge: Towards Deep Graph Convolutional Networks on Node Classification

DropEdge: Towards Deep Graph Convolutional Networks on Node Classification This is a Pytorch implementation of paper: DropEdge: Towards Deep Graph Con

401 Dec 16, 2022
Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Learned Virtual View Visibility ICCV2021

Vis2Mesh This is the offical repository of the paper: Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Lear

71 Dec 25, 2022
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Deep Daze mist over green hills shattered plates on the grass cosmic love and attention a time traveler in the crowd life during the plague meditative

Phil Wang 4.4k Jan 03, 2023
Semi-Supervised Semantic Segmentation with Cross-Consistency Training (CCT)

Semi-Supervised Semantic Segmentation with Cross-Consistency Training (CCT) Paper, Project Page This repo contains the official implementation of CVPR

Yassine 344 Dec 29, 2022
clustering moroccan stocks time series data using k-means with dtw (dynamic time warping)

Moroccan Stocks Clustering Context Hey! we don't always have to forecast time series am I right ? We use k-means to cluster about 70 moroccan stock pr

Ayman Lafaz 7 Oct 18, 2022
2021:"Bridging Global Context Interactions for High-Fidelity Image Completion"

TFill arXiv | Project This repository implements the training, testing and editing tools for "Bridging Global Context Interactions for High-Fidelity I

Chuanxia Zheng 111 Jan 08, 2023
TeST: Temporal-Stable Thresholding for Semi-supervised Learning

TeST: Temporal-Stable Thresholding for Semi-supervised Learning TeST Illustration Semi-supervised learning (SSL) offers an effective method for large-

Xiong Weiyu 1 Jul 14, 2022
Breaking Shortcut: Exploring Fully Convolutional Cycle-Consistency for Video Correspondence Learning

Breaking Shortcut: Exploring Fully Convolutional Cycle-Consistency for Video Correspondence Learning Yansong Tang *, Zhenyu Jiang *, Zhenda Xie *, Yue

Zhenyu Jiang 12 Nov 16, 2022
This git repo contains the implementation of my ML project on Heart Disease Prediction

Introduction This git repo contains the implementation of my ML project on Heart Disease Prediction. This is a real-world machine learning model/proje

Aryan Dutta 1 Feb 02, 2022
Neural network pruning for finding a sparse computational model for controlling a biological motor task.

MothPruning Scientific Overview Originally inspired by biological nervous systems, deep neural networks (DNNs) are powerful computational tools for mo

Olivia Thomas 0 Dec 14, 2022
Learning Energy-Based Models by Diffusion Recovery Likelihood

Learning Energy-Based Models by Diffusion Recovery Likelihood Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P. Kingma Paper: https://arxiv.o

Ruiqi Gao 41 Nov 22, 2022
AITom is an open-source platform for AI driven cellular electron cryo-tomography analysis.

AITom Introduction AITom is an open-source platform for AI driven cellular electron cryo-tomography analysis. AITom is originated from the tomominer l

93 Jan 02, 2023
Code and Datasets from the paper "Self-supervised contrastive learning for volcanic unrest detection from InSAR data"

Code and Datasets from the paper "Self-supervised contrastive learning for volcanic unrest detection from InSAR data" You can download the pretrained

Bountos Nikos 3 May 07, 2022
Easy way to add GoogleMaps to Flask applications. maintainer: @getcake

Flask Google Maps Easy to use Google Maps in your Flask application requires Jinja Flask A google api key get here Contribute To contribute with the p

Flask Extensions 611 Dec 05, 2022
Solutions and questions for AoC2021. Merry christmas!

Advent of Code 2021 Merry christmas! πŸŽ„ πŸŽ… To get solutions and approximate execution times for implementations, please execute the run.py script in t

Wilhelm Γ…gren 5 Dec 29, 2022
Simple and Robust Loss Design for Multi-Label Learning with Missing Labels

Simple and Robust Loss Design for Multi-Label Learning with Missing Labels Official PyTorch Implementation of the paper Simple and Robust Loss Design

Xinyu Huang 28 Oct 27, 2022
πŸ”₯ Cogitare - A Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python

Cogitare is a Modern, Fast, and Modular Deep Learning and Machine Learning framework for Python. A friendly interface for beginners and a powerful too

Cogitare - Modern and Easy Deep Learning with Python 76 Sep 30, 2022