Knowledge-Inheritance

Source code paper: Knowledge Inheritance for Pre-trained Language Models (preprint). The trained model parameters (in Fairseq format) can be downloaded from Tsinghua Cloud. You can use convert_fairseq_to_huggingface.py to convert the Fairseq format into Huggingface's transformers format easily.

We refer the downstream performance evaluation to the implementation of Fairseq (GLUE tasks) and Don't Stop Pre-training (ACL-ARC / CHEMPROT).

If you have any question, feel free to contact us ([email protected]).

1. Available Pretrained Models

WB domain: Wikipedia + BookCorpus; CS domain: computer science papers; BIO domain: biomedical papers;

Models trained by self-learning

RoBERTa_WB_H_4
RoBERTa_WB_H_6
RoBERTa_WB_H_8
RoBERTa_WB_H_10
RoBERTa_WB_D_288
RoBERTa_WB_D_384
RoBERTa_WB_D_480
RoBERTa_WB_D_576
RoBERTa_WB_D_672
RoBERTa_WB_BASE
RoBERTa_WB_MEDIUM
RoBERTa_WB_BASE_PLUS
RoBERTa_WB_LARGE
GPT_WB_MEDIUM
GPT_WB_BASE
GPT_WB_BASE_PLUS
RoBERTa_CS_MEDIUM
RoBERTa_CS_BASE
RoBERTa_BIO_MEDIUM
RoBERTa_BIO_BASE

Models trained by Knowledge Inheritance

RoBERTa_WB_BASE -> RoBERTa_WB_BASE_PLUS
RoBERTa_WB_BASE -> RoBERTa_WB_LARGE
RoBERTa_WB_BASE_PLUS -> RoBERTa_WB_LARGE
RoBERTa_WB_BASE -> RoBERTa_WB_BASE_PLUS -> RoBERTa_WB_LARGE

Source code for paper: Knowledge Inheritance for Pre-trained Language Models

Related tags

Overview

Knowledge-Inheritance

1. Available Pretrained Models

Models trained by self-learning

Models trained by Knowledge Inheritance

Owner

THUNLP

A Review of Deep Learning Techniques for Markerless Human Motion on Synthetic Datasets

The official code for PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

Implementation of UNET architecture for Image Segmentation.

This repository provides an unified frameworks to train and test the state-of-the-art few-shot font generation (FFG) models.

Ontologysim: a Owlready2 library for applied production simulation

Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)

Deep learning operations reinvented (for pytorch, tensorflow, jax and others)

TANL: Structured Prediction as Translation between Augmented Natural Languages

3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch.

Pixel-level Crack Detection From Images Of Levee Systems : A Comparative Study

An implementation of "Learning human behaviors from motion capture by adversarial imitation"

Canonical Appearance Transformations

natural image generation using ConvNets

Using Hotel Data to predict High Value And Potential VIP Guests

Official implementation of ACTION-Net: Multipath Excitation for Action Recognition (CVPR'21).

Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Human4D Dataset tools for processing and visualization

Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"

CS50's Introduction to Artificial Intelligence Test Scripts