Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Last update: Jan 05, 2023

Related tags

Overview

Regression Transformer

Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Development setup

conda env create -f conda.yml
conda activate terminator
pip install -e .

Generate some data

Example data for QED can be generated using scripts/generate_example_data.py.

python scripts/generate_example_data.py examples/example.smi examples/qed_property_example.txt

If you need to create a new vocabulary for a dataset you can use scripts/create_vocabulary.py it will also automatically add some special tokens at the top of your vocabulary file.

python scripts/create_vocabulary.py examples/qed_property_example.txt examples/vocab.txt

At this point the folder containing the vocabulary file can be used to load a tokenizer compatible with any ExpressionBertTokenizer:

>>> from terminator.tokenization import ExpressionBertTokenizer
>>> tokenizer = ExpressionBertTokenizer.from_pretrained('examples')
>>> text = '
   
    0.3936|CBr'
   
>>> tokens = tokenizer.tokenize(text)
>>> print(tokens)
['
   
    '
   , '_0_0_', '_._', '_3_-1_', '_9_-2_', '_3_-3_', '_6_-4_', '|', 'C', 'Br']
>>> token_indexes = tokenizer.convert_tokens_to_ids(tokenizer.tokenize(text))
>>> print(token_indexes)
[16, 17, 18, 28, 45, 34, 35, 19, 15, 63]
>>> tokenizer.build_inputs_with_special_tokens(token_indexes)
[12, 16, 17, 18, 28, 45, 34, 35, 19, 15, 63, 13]

Prepare some train/eval data line by line:

head -n 900 examples/qed_property_example.txt > examples/train.txt
tail -n +901 examples/qed_property_example.txt > examples/eval.txt

Launch the training:

python scripts/run_language_modeling.py --output_dir examples/models/xlnet_selfies \
    --config_name configs/xlnet_selfies.json --tokenizer_name ./examples/vocab.txt \
    --do_train --do_eval --learning_rate 1e-4 --num_train_epochs 5 --save_total_limit 2 \
    --save_steps 500 --per_gpu_train_batch_size 16 --evaluate_during_training --eval_data_file ./examples/eval.txt \
    --train_data_file ./examples/train.txt --line_by_line --block_size 510 --seed 42 --logging_steps 250

Exemplary model configurations (number of heads, layers, etc.) can be found in the configs folder.

Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Related tags

Overview

Regression Transformer

Development setup

Generate some data

Owner

International Business Machines

This repository contains Prior-RObust Bayesian Optimization (PROBO) as introduced in our paper "Accounting for Gaussian Process Imprecision in Bayesian Optimization"

Official Implementation of Few-shot Visual Relationship Co-localization

Code for the paper "Implicit Representations of Meaning in Neural Language Models"

Official code base for the poster "On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation" published in NeurIPS 2021 Workshop (SVRHM)

A clean and robust Pytorch implementation of PPO on continuous action space.

Official Codes for Graph Modularity:Towards Understanding the Cross-Layer Transition of Feature Representations in Deep Neural Networks.

Sharing of contents on mitochondrial encounter networks

Code for KHGT model, AAAI2021

Time Series Forecasting with Temporal Fusion Transformer in Pytorch

Attendance Monitoring with Face Recognition using Python

Swapping face using Face Mesh with TensorFlow Lite

(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

MBPO (paper: When to trust your model: Model-based policy optimization) in offline RL settings

A curated list and survey of awesome Vision Transformers.

Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers

This repository contains code to train and render Mixture of Volumetric Primitives (MVP) models

kullanışlı ve işinizi kolaylaştıracak bir araç

A Pose Estimator for Dense Reconstruction with the Structured Light Illumination Sensor

Weakly Supervised Text-to-SQL Parsing through Question Decomposition

tf2-keras implement yolov5