Asterisk is a framework to generate high-quality training datasets at scale

Last update: Apr 25, 2022

Related tags

Overview

Asterisk*

Generating Training Data made Easy

Asterisk is a framework to generate high-quality training datasets at scale. Instead of relying on the end users to write user-defined heuristics, the proposed approach exploits a small set of labeled data and automatically produces a set of heuristics to assign initial labels. In order to enhance the quality of the generated labels, the framework improves the accuracies of the heuristics by applying a novel data-driven AL process. During the process, the system examines the generated weak labels along with the modeled accuracies of the heuristics to help the learner decide on the points for which the user should provide true labels.

Installation

To install Asterisk, you can use pip:

pip install asterisk

or clone the Git repository and run:

pip install -e .

within it.

Publications

M. Nashaat, A. Ghosh, J. Miller, and S. Quader, “Asterisk: Generating Large Training Datasets with Automatic Active Supervision,” ACM Transactions on Data Science (TDS), May 2020.
M. Nashaat, A. Ghosh, J. Miller, and S. Quader, "WeSAL: Applying Active Supervision to Find High-quality Labels at Industrial Scale", Proceedings of the 53rd Hawaii International Conference on System Sciences, HI, USA, 2020, pp. 219-228.
M. Nashaat, A. Ghosh, J. Miller, S. Quader, C. Marston and J. Puget, "Hybridization of Active Learning and Data Programming for Labeling Large Industrial Datasets," 2018 IEEE International Conference on Big Data (Big Data) , Seattle, WA, USA, 2018, pp. 46-55. doi: 10.1109/BigData.2018.8622459.

Asterisk is a framework to generate high-quality training datasets at scale

Related tags

Overview

Asterisk*

Installation

Publications

Owner

Mona Nashaat

SEAN: Image Synthesis with Semantic Region-Adaptive Normalization (CVPR 2020, Oral)

A SAT-based sudoku solver

U-Net: Convolutional Networks for Biomedical Image Segmentation

The object detection pipeline is based on Ultralytics YOLOv5

GE2340 project source code without credentials.

A tool to analyze leveraged liquidity mining and find optimal option combination for hedging.

A PyTorch implementation of "CoAtNet: Marrying Convolution and Attention for All Data Sizes".

PyTorch Implementation of our paper Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation

Google Brain - Ventilator Pressure Prediction

A fast model to compute optical flow between two input images.

Object detection and instance segmentation toolkit based on PaddlePaddle.

PyTorch implementation of an end-to-end Handwritten Text Recognition (HTR) system based on attention encoder-decoder networks

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

Steer OpenAI's Jukebox with Music Taggers

Tool which allow you to detect and translate text.

Permute Me Softly: Learning Soft Permutations for Graph Representations

The codebase for our paper "Generative Occupancy Fields for 3D Surface-Aware Image Synthesis" (NeurIPS 2021)

Strongly local p-norm-cut algorithms for semi-supervised learning and local graph clustering

ScriptProfilerPy - Module to visualize where your python script is slow

Asterisk is a framework to generate high-quality training datasets at scale

Related tags

Overview

Asterisk*

Installation

Publications

Owner

Mona Nashaat

SEAN: Image Synthesis with Semantic Region-Adaptive Normalization (CVPR 2020, Oral)

A SAT-based sudoku solver

U-Net: Convolutional Networks for Biomedical Image Segmentation

The object detection pipeline is based on Ultralytics YOLOv5

GE2340 project source code without credentials.

A tool to analyze leveraged liquidity mining and find optimal option combination for hedging.

A PyTorch implementation of "CoAtNet: Marrying Convolution and Attention for All Data Sizes".

PyTorch Implementation of our paper Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation

Google Brain - Ventilator Pressure Prediction

A fast model to compute optical flow between two input images.

Object detection and instance segmentation toolkit based on PaddlePaddle.

PyTorch implementation of an end-to-end Handwritten Text Recognition (HTR) system based on attention encoder-decoder networks

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren*, Raymond A. Yeh*, Alexander G. Schwing.

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

Steer OpenAI's Jukebox with Music Taggers

Tool which allow you to detect and translate text.

Permute Me Softly: Learning Soft Permutations for Graph Representations

The codebase for our paper "Generative Occupancy Fields for 3D Surface-Aware Image Synthesis" (NeurIPS 2021)

Strongly local p-norm-cut algorithms for semi-supervised learning and local graph clustering

ScriptProfilerPy - Module to visualize where your python script is slow

code for paper "Not All Unlabeled Data are Equal: Learning to Weight Data in Semi-supervised Learning" by Zhongzheng Ren, Raymond A. Yeh, Alexander G. Schwing.