Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Last update: Dec 11, 2022

Related tags

Overview

Introduction

Funnel-Transformer is a new self-attention model that gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, Funnel-Transformer usually has a higher capacity given the same FLOPs. In addition, with a decoder, Funnel-Transformer is able to recover the token-level deep representation for each token from the reduced hidden sequence, which enables standard pretraining.

For a detailed description of technical details and experimental results, please refer to our paper:

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Zihang Dai*, Guokun Lai*, Yiming Yang, Quoc V. Le

(*: equal contribution)

Preprint 2020

Source Code

Data Download

The corresponding source code and instructions are in the data-scrips folder, which specifies how to access the raw data we used in this work.

TensorFlow

The corresponding source code is in the tensorflow folder, which was developed and exactly used for TPU pretraining & finetuning as presented in the paper.
The TensorFlow funetuning code mainly supports TPU finetuining on GLUE benchmark, text classification, SQuAD and RACE.
Please refer to tensorflow/README.md for details.

PyTorch

The source code is in the pytorch folder, which only serves as an example PyTorch implementation of Funnel-Transformer.
Hence, the PyTorch code only supports GPU finetuning for the GLUE benchmark & text classification.
Please refer to pytorch/README.md for details.

Pretrained models

Model Size	PyTorch	TensorFlow	TensorFlow-Full
B10-10-10H1024	Link	Link	Link
B8-8-8H1024	Link	Link	Link
B6-6-6H768	Link	Link	Link
B6-3x2-3x2H768	Link	Link	Link
B4-4-4H768	Link	Link	Link

Each .tar.gz file contains three items:

A TensorFlow or PyTorch checkpoint (model.ckpt-* or model.ckpt.pt) checkpoint containing the pre-trained weights (Note: The TensorFlow checkpoint actually corresponds to 3 files).
A Word Piece model (vocab.uncased.txt) used for (de)tokenization.
A config file (net_config.json or net_config.pytorch.json) which specifies the hyperparameters of the model.

You also can use download_all_ckpts.sh to download all checkpoints mentioned above.

For how to use the pretrained models, please refer to tensorflow/README.md or pytorch/README.md respectively.

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Related tags

Overview

Introduction

Source Code

Data Download

TensorFlow

PyTorch

Pretrained models

Results

Owner

GUOKUN LAI

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Nested Named Entity Recognition for Chinese Biomedical Text

PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP. Democratize AI for everyone.

CCF BDCI BERT系统调优赛题baseline（Pytorch版本）

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

The (extremely) naive sentiment classification function based on NBSVM trained on wisesight_sentiment

ChainKnowledgeGraph, 产业链知识图谱包括A股上市公司、行业和产品共3类实体

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

LightSeq: A High-Performance Inference Library for Sequence Processing and Generation

Yet Another Sequence Encoder - Encode sequences to vector of vector in python !

Repository for the paper: VoiceMe: Personalized voice generation in TTS

Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

Twitter-NLP-Analysis - Twitter Natural Language Processing Analysis

The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

Code for Text Prior Guided Scene Text Image Super-Resolution

Product-Review-Summarizer - Created a product review summarizer which clustered thousands of product reviews and summarized them into a maximum of 500 characters, saving precious time of customers and helping them make a wise buying decision.

This repository contains the code for "Generating Datasets with Pretrained Language Models".

Seq2seq attn - Use the Seq2Seq method to implement machine translation and introduce Attention mechanism to improve the results

Interpretable Models for NLP using PyTorch