The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

Overview

tldr-transformers

The tl;dr on a few notable transformer/language model papers + other papers (alignment, memorization, etc).

Models: GPT- *, * BERT *, Adapter- *, * T5, etc.

BERT and T5 (art from the original papers)

Β  Β  Β 

Each set of notes includes links to the paper, the original code implementation (if available) and the Huggingface πŸ€— implementation.

Here is an example: t5.

The transformers papers are presented somewhat chronologically below. Go to the " πŸ‘‰ Notes πŸ‘ˆ " column below to find the notes for each paper.

This repo also includes a table quantifying the differences across transformer papers all in one table.

Contents

Quick_Note

This is not an intro to deep learning in NLP. If you are looking for that, I recommend one of the following: Fast AI's course, one of the Coursera courses, or maybe this old thing. Come here after that.

Motivation

With the explosion in papers on all things Transformers the past few years, it seems useful to catalog the salient features/results/insights of each paper in a digestible format. Hence this repo.

Models

Model Year Institute Paper πŸ‘‰ Notes πŸ‘ˆ Original Code Huggingface πŸ€— Other Repo
Transformer 2017 Google Attention is All You Need Skipped, too many good write-ups: ?
GPT-3 2018 OpenAI Language Models are Unsupervised Multitask Learners To-Do X X
GPT-J-6B 2021 EleutherAI GPT-J-6B: 6B Jax-Based Transformer (public GPT-3) X here x x
BERT 2018 Google BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding BERT notes here here
DistilBERT 2019 Huggingface DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter DistilBERT notes here
ALBERT 2019 Google/Toyota ALBERT: A Lite BERT for Self-supervised Learning of Language Representations ALBERT notes here here
RoBERTa 2019 Facebook RoBERTa: A Robustly Optimized BERT Pretraining Approach RoBERTa notes here here
BART 2019 Facebook BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension BART notes here here
T5 2019 Google Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer T5 notes here here
Adapter-BERT 2019 Google Parameter-Efficient Transfer Learning for NLP Adapter-BERT notes here - here
Megatron-LM 2019 NVIDIA Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Megatron notes here - here
Reformer 2020 Google Reformer: The Efficient Transformer Reformer notes here
byT5 2021 Google ByT5: Towards a token-free future with pre-trained byte-to-byte models ByT5 notes here here
CLIP 2021 OpenAI Learning Transferable Visual Models From Natural Language Supervision CLIP notes here here
DALL-E 2021 OpenAI Zero-Shot Text-to-Image Generation DALL-E notes here -
Codex 2021 OpenAI Evaluating Large Language Models Trained on Code Codex notes X -

BigTable

All of the table summaries found ^ collapsed into one really big table here.

Alignment

Paper Year Institute πŸ‘‰ Notes πŸ‘ˆ Codes
Fine-Tuning Language Models from Human Preferences 2019 OpenAI To-Do None

Scaling

Paper Year Institute πŸ‘‰ Notes πŸ‘ˆ Codes
Scaling Laws for Neural Language Models 2020 OpenAI To-Do None

Memorization

Paper Year Institute πŸ‘‰ Notes πŸ‘ˆ Codes
Extracting Training Data from Large Language Models 2021 Google et al. To-Do None
Deduplicating Training Data Makes Language Models Better 2021 Google et al. To-Do None

FewLabels

Paper Year Institute πŸ‘‰ Notes πŸ‘ˆ Codes
An Empirical Survey of Data Augmentation for Limited Data Learning in NLP 2021 GIT/UNC To-Do None
Learning with fewer labeled examples 2021 Kevin Murphy & Colin Raffel (Preprint: "Probabilistic Machine Learning", Chapter 19) Worth a read, won't summarize here. None

Contribute

If you are interested in contributing to this repo, feel free to do the following:

  1. Fork the repo.
  2. Create a Draft PR with the paper of interest (to prevent "in-flight" issues).
  3. Use the suggested template to write your "tl;dr". If it's an architecture paper, you may also want to add to the larger table here.
  4. Submit your PR.

Errata

Undoubtedly there is information that is incorrect here. Please open an Issue and point it out.

Citation

@misc{cliff-notes-transformers,
  author = {Thompson, Will},
  url = {https://github.com/will-thompson-k/cliff-notes-transformers},
  year = {2021}
}

For the notes above, I've linked the original papers.

License

MIT

Owner
Will Thompson
Will Thompson
[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
Pytorch implementation of NeurIPS 2021 paper: Geometry Processing with Neural Fields.

Geometry Processing with Neural Fields Pytorch implementation for the NeurIPS 2021 paper: Geometry Processing with Neural Fields Guandao Yang, Serge B

Guandao Yang 162 Dec 16, 2022
EMNLP 2021 paper The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers.

Codebase for training transformers on systematic generalization datasets. The official repository for our EMNLP 2021 paper The Devil is in the Detail:

CsordΓ‘s RΓ³bert 57 Nov 21, 2022
DLL: Direct Lidar Localization

DLL: Direct Lidar Localization Summary This package presents DLL, a direct map-based localization technique using 3D LIDAR for its application to aeri

Service Robotics Lab 127 Dec 16, 2022
Edison AT is software Depression Assistant personal.

Edison AT Edison AT is software / program Depression Assistant personal. Feature: Analyze emotional real-time from face. Audio Edison(Comingsoon relea

Ananda Rauf 2 Apr 24, 2022
Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration

CoGAIL Table of Content Overview Installation Dataset Training Evaluation Trained Checkpoints Acknowledgement Citations License Overview This reposito

Jeremy Wang 29 Dec 24, 2022
Simple and Robust Loss Design for Multi-Label Learning with Missing Labels

Simple and Robust Loss Design for Multi-Label Learning with Missing Labels Official PyTorch Implementation of the paper Simple and Robust Loss Design

Xinyu Huang 28 Oct 27, 2022
Powerful unsupervised domain adaptation method for dense retrieval.

Powerful unsupervised domain adaptation method for dense retrieval

Ubiquitous Knowledge Processing Lab 191 Dec 28, 2022
πŸ₯ˆ78th place in Riiid Answer Correctness Prediction competition

Riiid Answer Correctness Prediction Introduction This repository is the code that placed 78th in Riiid Answer Correctness Prediction competition. Requ

Jungwoo Park 10 Jul 14, 2022
Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation

FLAME Original Pytorch Implementation of FLAME: Facial Landmark Heatmap Activated Multimodal Gaze Estimation, accepted at the 17th IEEE Internation Co

Neelabh Sinha 19 Dec 17, 2022
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
PyTorch implementation for Graph Contrastive Learning with Augmentations

Graph Contrastive Learning with Augmentations PyTorch implementation for Graph Contrastive Learning with Augmentations [poster] [appendix] Yuning You*

Shen Lab at Texas A&M University 382 Dec 15, 2022
Code for the AAAI-2022 paper: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification (AAAI 2022) Prerequisite PyTorch = 1.2.0 P

16 Dec 14, 2022
PFLD pytorch Implementation

PFLD-pytorch Implementation of PFLD A Practical Facial Landmark Detector by pytorch. 1. install requirements pip3 install -r requirements.txt 2. Datas

zhaozhichao 669 Jan 02, 2023
A Python package for performing pore network modeling of porous media

Overview of OpenPNM OpenPNM is a comprehensive framework for performing pore network simulations of porous materials. More Information For more detail

PMEAL 336 Dec 30, 2022
In real-world applications of machine learning, reliable and safe systems must consider measures of performance beyond standard test set accuracy

PixMix Introduction In real-world applications of machine learning, reliable and safe systems must consider measures of performance beyond standard te

Andy Zou 79 Dec 30, 2022
Make a Turtlebot3 follow a figure 8 trajectory and create a robot arm and make it follow a trajectory

HW2 - ME 495 Overview Part 1: Makes the robot move in a figure 8 shape. The robot starts moving when launched on a real turtlebot3 and can be paused a

Devesh Bhura 0 Oct 21, 2022
All supplementary material used by me while TA-ing CS3244: Machine Learning

CS3244-Tutorial-Material All supplementary material used by me while TA-ing CS3244: Machine Learning at NUS School of Computing. What is this? I teach

Rishabh Anand 18 Sep 23, 2022
SCI-AIDE : High-fidelity Few-shot Histopathology Image Synthesis for Rare Cancer Diagnosis

SCI-AIDE : High-fidelity Few-shot Histopathology Image Synthesis for Rare Cancer Diagnosis Pretrained Models In this work, we created synthetic tissue

Emirhan Kurtuluş 1 Feb 07, 2022
Space-invaders - Simple Game created using Python & PyGame, as my Beginner Python Project

Space Invaders This is a simple SPACE INVADER game create using PYGAME whihc hav

Gaurav Pandey 2 Jan 08, 2022