Source Code for ICSE 2022 Paper - ``Can We Achieve Fairness Using Semi-Supervised Learning?''

Last update: Dec 18, 2021

Related tags

Overview

Fair-SSL

Source Code for ICSE 2022 Paper - Can We Achieve Fairness Using Semi-Supervised Learning?

Ethical bias in machine learning models has become a matter of concern in the software engineering community. Most of the prior software engineering works concentrated on finding ethical bias in models rather than fixing it. After finding bias, the next step is mitigation. Prior researchers mainly tried to use supervised approaches to achieve fairness. However, in the real world, getting data with trustworthy ground truth is challenging and also ground truth can contain human bias. Semi-supervised learning is a domain of machine learning where labeled and unlabeled data both are used to overcome the data labeling challenges. We, in this work, applied four popular semi-supervised techniques as pseudo-labelers to create fair classification models. Our framework, Fair-SSL, takes a very small amount (10%) of labeled data as input and generates pseudo-labels for the unlabeled data. We then synthetically generate new data points to balance the training data based on class and protected attribute as proposed by Chakraborty et al. in FSE 2021. Finally, classification model is trained on the balanced pseudo-labeled data and validated on test data. After experimenting on ten datasets and three learners, we found out that Fair-SSL achieves similar performance like three other state-of-the-art bias mitigation algorithms. Where prior algorithms require much training data, Fair-SSL requires only 10% of the labeled training data. As per our knowledge, this is the first SE work where semi-supervised techniques are used to fight against ethical bias in ML models.

Dataset Description -

1> Adult Income dataset - http://archive.ics.uci.edu/ml/datasets/Adult

2> COMPAS - https://github.com/propublica/compas-analysis

3> German Credit - https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29

4> Bank Marketing - https://archive.ics.uci.edu/ml/datasets/bank+marketing

5> Default Credit - https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

6> Heart - https://archive.ics.uci.edu/ml/datasets/Heart+Disease

7> MEPS - https://meps.ahrq.gov/mepsweb/

8> Student - https://archive.ics.uci.edu/ml/datasets/Student+Performance

9> Home Credit - https://www.kaggle.com/c/home-credit-default-risk

Data Preprocessing -

We have used data preprocessing as suggested by IBM AIF360
The rows containing missing values are ignored, continuous features are converted to categorical (e.g., age<25: young,age>=25: old), non-numerical features are converted to numerical(e.g., male: 1, female: 0). Fiinally, all the feature values are normalized(converted between 0 to 1).
For optimized Pre-processing, plaese visit Optimized Preprocessing

Source Code for ICSE 2022 Paper - ``Can We Achieve Fairness Using Semi-Supervised Learning?''

Related tags

Overview

Fair-SSL

Dataset Description -

Data Preprocessing -

Owner

This repository contains an implementation of the Permutohedral Attention Module in Pytorch

ROMP: Monocular, One-stage, Regression of Multiple 3D People, ICCV21

Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

Official implement of Paper：A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sening images

PyTorch Implementation of Exploring Explicit Domain Supervision for Latent Space Disentanglement in Unpaired Image-to-Image Translation.

2D Time independent Schrodinger equation solver for arbitrary shape of well

QuakeLabeler is a Python package to create and manage your seismic training data, processes, and visualization in a single place — so you can focus on building the next big thing.

Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)

CS_Final_Metal_surface_detection - This is a final project for CoderSchool Machine Learning bootcamp on 29/12/2021.

A Decentralized Omnidirectional Visual-Inertial-UWB State Estimation System for Aerial Swar.

Aggragrating Nested Transformer Official Jax Implementation

P-Tuning v2: Prompt Tuning Can Be Comparable to Finetuning Universally Across Scales and Tasks

Recognize Handwritten Digits using Deep Learning on the browser itself.

Distributed Evolutionary Algorithms in Python

Nest Protect integration for Home Assistant. This will allow you to integrate your smoke, heat, co and occupancy status real-time in HA.

Plugin adapted from Ultralytics to bring YOLOv5 into Napari

This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

A novel framework to automatically learn high-quality scanning of non-planar, complex anisotropic appearance.

Codebase for "ProtoAttend: Attention-Based Prototypical Learning."

Reinforcement learning for self-driving in a 3D simulation