A curated list of programmatic weak supervision papers and resources

Overview

Awesome-Weak-Supervision Awesome

A curated list of programmatic/rule-based weak supervision papers and resources.

Contents

An overview of weak supervision

Blogs

An Overview of Weak Supervision

Building NLP Classifiers Cheaply With Transfer Learning and Weak Supervision

Videos

Theory & Systems for Weak Supervision | Chinese Version

Lecture Notes

Lecture Notes on Weak Supervision

Algorithm

Data Programming: Creating Large Training Sets, Quickly. Alex Ratner NeurIPS 2016

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data. Paroma Varma FILM-NeurIPS 2016

Training Complex Models with Multi-Task Weak Supervision. Alex Ratner AAAI 2019

Data Programming using Continuous and Quality-Guided Labeling Functions. Oishik Chatterjee AAAI 2020

Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods. Dan Fu ICML 2020

Learning from Rules Generalizing Labeled Exemplars. Abhijeet Awasthi ICLR 2020

Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings. Mayee F. Chen 2020

Learning the Structure of Generative Models without Labeled Data. Stephen H. Bach ICML 2017

Inferring Generative Model Structure with Static Analysis. Paroma Varma NeurIPS 2017

Learning Dependency Structures for Weak Supervision Models. Paroma Varma ICML 2019

Self-Training with Weak Supervision. Giannis Karamanolakis NAACL 2021

Interactive Programmatic Labeling for Weak Supervision. Benjamin Cohen-Wang KDD Workshop 2019

Pairwise Feedback for Data Programming. Benedikt Boecking NeurIPS 2019 workshop on Learning with Rich Experience: Integration of Learning Paradigms

Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling. Benedikt Boecking ICLR 2021

Active WeaSuL: Improving Weak Supervision with Active Learning. Samantha Biegel ICLR WeaSuL 2021

System

Snorkel: Rapid Training Data Creation with Weak Supervision. Alex Ratner VLDB 2018

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale. Stephen H. Bach SIGMOD (Industrial) 2019

Snuba: Automating Weak Supervision to Label Training Data. Paroma Varma VLDB 2019

Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design. Ying Sheng CIDR 2020

Overton: A Data System for Monitoring and Improving Machine-Learned Products. Christopher Ré CIDR 2020

Ruler: Data Programming by Demonstration for Document Labeling. Sara Evensen EMNLP 2020 Findings

skweak: Weak Supervision Made Easy for NLP. Pierre Lison 2021

Application

CV

Scene Graph Prediction with Limited Labels. Vincent Chen ICCV 2019

Multi-Resolution Weak Supervision for Sequential Data. Paroma Varma NeurIPS 2019

Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels. Daniel Y. Fu SOSP 2019

GOGGLES: Automatic Image Labeling with Affinity Coding. Nilaksh Das SIGMOD 2020

Cut out the annotator, keep the cutout: better segmentation with weak supervision. Sarah Hooper ICLR 2021

Task Programming: Learning Data Efficient Behavior Representations. Jennifer J. Sun CVPR 2021

NLP

Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach. Liyuan Liu EMNLP 2017

Training Classifiers with Natural Language Explanations. Braden Hancock ACL 2018

Deep Text Mining of Instagram Data without Strong Supervision. Kim Hammar ICWI 2018

Bootstrapping Conversational Agents With Weak Supervision. Neil Mallinar AAAI 2019

Weakly Supervised Sequence Tagging from Noisy Rules. Esteban Safranchik AAAI 2020

NERO: A Neural Rule Grounding Framework for Label-Efficient Relation Extraction. Wenxuan Zhou WWW 2020

Named Entity Recognition without Labelled Data: A Weak Supervision Approach. Pierre Lison ACL 2020

Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach. Yue Yu NAACL 2021

BERTifying Hidden Markov Models for Multi-Source Weakly Supervised Named Entity Recognition Yinghao Li ACL 2021

RL

Generating Multi-Agent Trajectories using Programmatic Weak Supervision. Eric Zhan ICLR 2019

Others

Generating Training Labels for Cardiac Phase-Contrast MRI Images. Vincent Chen MED-NeurIPS 2017

Osprey: Weak Supervision of Imbalanced Extraction Problems without Code. Eran Bringer SIGMOD DEEM Workshop 2019

Weakly Supervised Classification of Rare Aortic Valve Malformations Using Unlabeled Cardiac MRI Sequences. Jason Fries Nature Communications 2019

Doubly Weak Supervision of Deep Learning Models for Head CT. Khaled Saab MICCAI 2019

A clinical text classification paradigm using weak supervision and deep representation. Yanshan Wang BMC MIDM 2019

A machine-compiled database of genome-wide association studies. Volodymyr Kuleshov Nature Communications 2019

Weak Supervision as an Efficient Approach for Automated Seizure Detection in Electroencephalography. Khaled Saab NPJ Digital Medicine 2020

Extracting Chemical Reactions From Text Using Snorkel. Emily Mallory BMC Bioinformatics 2020

Cross-Modal Data Programming Enables Rapid Medical Machine Learning. Jared A. Dunnmon Patterns 2020

SwellShark: A Generative Model for Biomedical Named Entity Recognition without Labeled Data. Jason Fries

Ontology-driven weak supervision for clinical entity classification in electronic health records. Jason Fries Nature Communications 2021

Utilizing Weak Supervision to Infer Complex Objects and Situations in Autonomous Driving Data. Zhenzhen Weng IV 2019

Multi-frame Weak Supervision to Label Wearable Sensor Data. Saelig Khattar ICML Time Series Workshop 2019

Thesis

Acclerating Machine Learning with Training Data Management. Alex Ratner

Weak Supervision From High-Level Abstrations. Braden Jay Hancock

Other Weak Supervision Paradigm

Label-name Only Supervision

Weakly-Supervised Neural Text Classification. Yu Meng CIKM 2018

Weakly-Supervised Hierarchical Text Classification. Yu Meng AAAI 2019

Weakly-Supervised Aspect-Based Sentiment Analysis via Joint Aspect-Sentiment Topic Embedding. Jiaxin Huang EMNLP 2020

Text Classification Using Label Names Only: A Language Model Self-Training Approach. Yu Meng EMNLP 2020

Hierarchical Metadata-Aware Document Categorization under Weak Supervision. Yu Zhang WSDM 2021

Owner
Jieyu Zhang
CS PhD
Jieyu Zhang
Official repository of the AAAI'2022 paper "Contrast and Generation Make BART a Good Dialogue Emotion Recognizer"

CoG-BART Contrast and Generation Make BART a Good Dialogue Emotion Recognizer Quick Start: To run the model on test sets of four datasets, Download th

39 Dec 24, 2022
A GOOD REPRESENTATION DETECTS NOISY LABELS

A GOOD REPRESENTATION DETECTS NOISY LABELS This code is a PyTorch implementation of the paper: Prerequisites Python 3.6.9 PyTorch 1.7.1 Torchvision 0.

<a href=[email protected]"> 64 Jan 04, 2023
Pytorch Performace Tuning, WandB, AMP, Multi-GPU, TensorRT, Triton

Plant Pathology 2020 FGVC7 Introduction A deep learning model pipeline for training, experimentaiton and deployment for the Kaggle Competition, Plant

Bharat Giddwani 0 Feb 25, 2022
Language model Prompt And Query Archive

LPAQA: Language model Prompt And Query Archive This repository contains data and code for the paper How Can We Know What Language Models Know? Install

127 Dec 20, 2022
Posterior predictive distributions quantify uncertainties ignored by point estimates.

Posterior predictive distributions quantify uncertainties ignored by point estimates.

DeepMind 177 Dec 06, 2022
Implementation of a Transformer, but completely in Triton

Transformer in Triton (wip) Implementation of a Transformer, but completely in Triton. I'm completely new to lower-level neural net code, so this repo

Phil Wang 152 Dec 22, 2022
Robocop is your personal mini voice assistant made using Python.

Robocop-VoiceAssistant To use this project, you should have python installed in your system. If you don't have python installed, install it beforehand

Sohil Khanduja 3 Feb 26, 2022
Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021)

TDEER (WIP) Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021) Overview TDEER is an e

Alipay 6 Dec 17, 2022
AI Virtual Calculator: This is a simple virtual calculator based on Artificial intelligence.

AI Virtual Calculator: This is a simple virtual calculator that works with gestures using OpenCV. We will use our hand in the air to click on the calc

Md. Rakibul Islam 1 Jan 13, 2022
RoMa: A lightweight library to deal with 3D rotations in PyTorch.

RoMa: A lightweight library to deal with 3D rotations in PyTorch. RoMa (which stands for Rotation Manipulation) provides differentiable mappings betwe

NAVER 90 Dec 27, 2022
AI-UPV at IberLEF-2021 EXIST task: Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models

AI-UPV at IberLEF-2021 EXIST task: Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models Descrip

Angel de Paula 1 Jun 08, 2022
Implementation of Segnet, FCN, UNet , PSPNet and other models in Keras.

Image Segmentation Keras : Implementation of Segnet, FCN, UNet, PSPNet and other models in Keras. Implementation of various Deep Image Segmentation mo

Divam Gupta 2.6k Jan 05, 2023
DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time

DR-GAN: Automatic Radial Distortion Rectification Using Conditional GAN in Real-Time Introduction This is official implementation for DR-GAN (IEEE TCS

Kang Liao 18 Dec 23, 2022
Code for models used in Bashiri et al., "A Flow-based latent state generative model of neural population responses to natural images".

A Flow-based latent state generative model of neural population responses to natural images Code for "A Flow-based latent state generative model of ne

Sinz Lab 5 Aug 26, 2022
PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

D2C: Diffuison-Decoding Models for Few-shot Conditional Generation Project | Paper PyTorch implementation of D2C: Diffuison-Decoding Models for Few-sh

Jiaming Song 90 Dec 27, 2022
Metrics to evaluate quality and efficacy of synthetic datasets.

An Open Source Project from the Data to AI Lab, at MIT Metrics for Synthetic Data Generation Projects Website: https://sdv.dev Documentation: https://

The Synthetic Data Vault Project 129 Jan 03, 2023
DA2Lite is an automated model compression toolkit for PyTorch.

DA2Lite (Deep Architecture to Lite) is a toolkit to compress and accelerate deep network models. ⭐ Star us on GitHub — it helps!! Frameworks & Librari

Sinhan Kang 7 Mar 22, 2022
PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Contrast to Divide: self-supervised pre-training for learning with noisy labels This is an official implementation of "Contrast to Divide: self-superv

55 Nov 23, 2022
Code for: https://berkeleyautomation.github.io/bags/

DeformableRavens Code for the paper Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks. Here is the

Daniel Seita 121 Dec 30, 2022
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Skeleton Aware Multi-modal Sign Language Recognition By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu. Smile Lab @ Northeastern

Isen (Songyao Jiang) 128 Dec 08, 2022