Code for producing Japanese GPT-2 provided by rinna Co., Ltd.

Last update: Jan 07, 2023

Related tags

Overview

japanese-gpt2

This repository provides the code for training Japanese GPT-2 models. This code has been used for producing japanese-gpt2-medium released on HuggingFace model hub by rinna.

Please open an issue (in English/日本語) if you encounter any problem using the code or using our models via Huggingface.

Train a Japanese GPT-2 from scratch on your own machine

Download training corpus Japanese CC-100 and extract the ja.txt file.
Move the ja.txt file or modify src/corpus/jp_cc100/config.py to match the filepath of ja.txt with self.raw_data_dir in the config file.
Split ja.txt to smaller files by running:

cd src/
python -m corpus.jp_cc100.split_to_small_files

Train a medium-sized GPT-2 on 4 GPUs by running:

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m task.pretrain.train --n_gpus 4 --save_model True --enable_log True

Interact with the trained model

Assume you have run the training script and saved your medium-sized GPT-2 to data/model/gpt2-medium-xxx.checkpoint. Run the following command to use it to complete text on one GPU by nucleus sampling with p=0.95 and k=40:

CUDA_VISIBLE_DEVICES=0 python -m task.pretrain.interact --checkpoint_path ../data/model/gpt2-medium-xxx.checkpoint --gen_type top --top_p 0.95 --top_k 40

Prepare files for uploading to Huggingface

Make your Huggingface account; Create a model repo; Clone it to your local machine.
Create model and config files from a checkpoint by running:

python -m task.pretrain.checkpoint2huggingface --checkpoint_path ../data/model/gpt2-medium-xxx.checkpoint --save_dir {huggingface's model repo directory}

Validate the created files by running:

python -m task.pretrain.check_huggingface --model_dir {huggingface's model repo directory}

Add files, commit, and push to your Huggingface repo.

Customize your training script

Check available arguments by running:

python -m task.pretrain.train --help

License

The MIT license

Code for producing Japanese GPT-2 provided by rinna Co., Ltd.

Related tags

Overview

japanese-gpt2

Train a Japanese GPT-2 from scratch on your own machine

Interact with the trained model

Prepare files for uploading to Huggingface

Customize your training script

License

Owner

rinna Co.,Ltd.

Large-scale pretraining for dialogue

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

RoNER is a Named Entity Recognition model based on a pre-trained BERT transformer model trained on RONECv2

Python Implementation of ``Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT'' (Findings of ACL: ACL 2021)

This repository contains (not all) code from my project on Named Entity Recognition in philosophical text

Biterm Topic Model (BTM): modeling topics in short texts

Automatically search Stack Overflow for the command you want to run

Global Rhythm Style Transfer Without Text Transcriptions

Code for paper "Which Training Methods for GANs do actually Converge? (ICML 2018)"

Azure Text-to-speech service for Home Assistant

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

Negative sampling for solving the unlabeled entity problem in NER. ICLR-2021 paper: Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.

Use Tensorflow2.7.0 Build OpenAI'GPT-2

A simple implementation of N-gram language model.

Text preprocessing, representation and visualization from zero to hero.

State of the art faster Natural Language Processing in Tensorflow 2.0 .

AEC_DeepModel - Deep learning based acoustic echo cancellation baseline code

An official repository for tutorials of Probabilistic Modelling and Reasoning (2021/2022) - a University of Edinburgh master's course.

The entmax mapping and its loss, a family of sparse softmax alternatives.