Finding Label and Model Errors in Perception Data With Learned Observation Assertions

Last update: Oct 14, 2022

Related tags

Text Data & NLP loa

Overview

Finding Label and Model Errors in Perception Data With Learned Observation Assertions

This is the project page for Finding Label and Model Errors in Perception Data With Learned Observation Assertions.

Please read the paper for full technical details.

Installation

In the root directory, run

pip install -e .

Examples

We provide an example of the Lyft Level 5 percetion dataset. We have provided model predictions for convenience, but you will need to download the dataset here.

All of the scripts are available in examples/lyft_level5. In order to run the scripts, do the following:

Set the data directories in constants.py.
Learn the priors with learn_priors.py.
Run LOA with prior_lyft.py.

You can visualize the results with viz_track.py.

Citation

If you find this project useful, please cite us at

@article{kang2021finding,
  title={Finding Label and Model Errors in Perception Data With Learned Observation Assertions},
  author={Kang, Daniel and Arechiga, Nikos and Pillai, Sudeep and Bailis, Peter and Zaharia, Matei},
}

and contact us if you deploy LOA!

Finding Label and Model Errors in Perception Data With Learned Observation Assertions

Related tags

Overview

Finding Label and Model Errors in Perception Data With Learned Observation Assertions

Installation

Examples

Citation

Owner

Stanford Future Data Systems

📜 GPT-2 Rhyming Limerick and Haiku models using data augmentation

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

A notebook that shows how to import the IITB English-Hindi Parallel Corpus from the HuggingFace datasets repository

PyTorch original implementation of Cross-lingual Language Model Pretraining.

Simple, Fast, Powerful and Easily extensible python package for extracting patterns from text, with over than 60 predefined Regular Expressions.

Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners

ETM - R package for Topic Modelling in Embedding Spaces

This repository contains (not all) code from my project on Named Entity Recognition in philosophical text

DAGAN - Dual Attention GANs for Semantic Image Synthesis

A fast Text-to-Speech (TTS) model. Work well for English, Mandarin/Chinese, Japanese, Korean, Russian and Tibetan (so far). 快速语音合成模型，适用于英语、普通话/中文、日语、韩语、俄语和藏语（当前已测试）。

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

A simple Speech Emotion Recognition (SER) API created using Flask and running in a Docker container.

Transformer related optimization, including BERT, GPT

Augmenty is an augmentation library based on spaCy for augmenting texts.

An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

Rich Prosody Diversity Modelling with Phone-level Mixture Density Network

Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Code voor mijn Master project omtrent VideoBERT