Improved Fitness Optimization Landscapes for Sequence Design

Last update: Dec 20, 2022

Overview

ReLSO

Improved Fitness Optimization Landscapes for Sequence Design

Description
Citation
How to run
Training models
Original data source

Description

In recent years, deep learning approaches for determining protein sequence-fitness relationships have gained traction. Advances in high-throughput mutagenesis, directed evolution, and next-generation sequencing have allowed for the accumulation of large amounts of labelled fitness data and consequently, attracted the application of various deep learning methods. Although these methods learn an implicit fitness landscape, there is little work on using the latent encoding directly for protein sequence optimization. Here we show that this latent space representation of a fitness landscape can be made very amenable to latent space optimization through a joint-training process. We also show that this encoding strategy which also provides improvements to generalization over more traditional training strategies. We apply our approach to several biological contexts and show that latent space optimization in a smooth learned folding landscape allows for more accurate and efficient optimization of protein sequences.

Citation

This repo accompanies the following publication:

Egbert Castro, Abhinav Godavarthi, Julien Rubinfien, Smita Krishnaswamy. Guided Generative Protein Design using Regularized Transformers. Nature Machine Intelligence, in review (2021).

How to run

First, install dependencies

# clone project   
git clone https://github.com/KrishnaswamyLab/ReLSO-Guided-Generative-Protein-Design-using-Regularized-Transformers.git


# install project   
cd ReLSO-Guided-Generative-Protein-Design-using-Regularized-Transformers 
pip install -e .   
pip install -r requirements.txt

Usage

Training models

# run training script
python train_relso.py  --data gifford

*note: if arg option is not relevant to current model selection, it will not be used. See init method of each model to see what's used.

available dataset args:

    gifford, GB1_WU, GFP, TAPE

available auxnetwork args:

    base_reg

Original data sources

You might also like...

An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

195 Dec 17, 2022

Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

24 Oct 26, 2022

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

1.4k Jan 8, 2023

Aircraft design optimization made fast through modern automatic differentiation

Aircraft design optimization made fast through modern automatic differentiation. Plug-and-play analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.

394 Dec 23, 2022

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

3.7k Jan 3, 2023

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

1.4k Dec 25, 2022

Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Racing Line Optimization with PSO This repository contains a racing line optimization algorithm in python that uses Particle Swarm Optimization. Requi

6 Dec 14, 2022

Puzzle-CAM: Improved localization via matching partial and full features.

Puzzle-CAM The official implementation of "Puzzle-CAM: Improved localization via matching partial and full features".

150 Nov 14, 2022

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

35 Oct 26, 2022

Comments

Conda env create not working
When I type in the command as instructed in how to run, I get this error:

Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies. Conda may not use the correct pip to install your packages, and they may end up in the wrong place. Please add an explicit pip dependency. I'm adding one for you, but still nagging you. Collecting package metadata (repodata.json): done Solving environment: failed

ResolvePackageNotFound:

libcxx==12.0.0=h2f01273_0

python==3.10.4=hdfd78df_0

openssl==1.1.1q=hca72f7f_0

ncurses==6.3=hca72f7f_3

readline==8.1.2=hca72f7f_1

bzip2==1.0.8=h1de35cc_0

ca-certificates==2022.07.19=hecd8cb5_0

xz==5.2.5=hca72f7f_1

libffi==3.3=hb1e8313_2

zlib==1.2.12=h4dc903c_2

sqlite==3.38.5=h707629a_0

tk==8.6.12=h5d9f67b_0
opened by Pixelatory 1
May the internal information of gifford data leads to a bias results given by model?

I'm very intersted in your work and analysize the gifford data. Firstly, I use the CD-HIT( a Cluster tool) split into different clusters.Then, I chose the sequence (comes the Clsuter-1(a cluster subset contaiing similar sequences given by CD-HIT)) with highest enrich value as a baseline, and focus on the residue difference between it and others sequences. Very interstingly, i find those sequences that containg 2 or 3 different residues compared to baseline sequence, usually have high enrichments. In Top-100 high enrichments, it can at 65%. As i know， your work is a multitask that both focus on generation and prediction. **I wonder that whether the JT-VAE tends to produce the new sequences that different from the corresponding baseline sequence with highest enrichment about 2 or 3 different residues , and the prediction neural network may think such sequences are good results.**It means that the model only need to realize the fact that compared to high enrich sequnces,the new sequnces contain 2 or 3 different residues is good enough. Beacuse i not find your results, i hope you can give me some advices.

opened by chengyunzhang 0

Releases(v1.0)

v1.0(Jul 31, 2022)

Published version of code and dataset.
Source code(tar.gz)
Source code(zip)

Owner

Krishnaswamy Lab

GitHub Repository

Fast and Context-Aware Framework for Space-Time Video Super-Resolution (VCIP 2021)

Fast and Context-Aware Framework for Space-Time Video Super-Resolution Preparation Dependencies PyTorch 1.2.0 CUDA 10.0 DCNv2 cd model/DCNv2 bash make

1 Mar 29, 2022

CenterNet:Objects as Points目标检测模型在Pytorch当中的实现

267 Dec 29, 2022

[IEEE TPAMI21] MobileSal: Extremely Efficient RGB-D Salient Object Detection [PyTorch & Jittor]

MobileSal IEEE TPAMI 2021: MobileSal: Extremely Efficient RGB-D Salient Object Detection This repository contains full training & testing code, and pr

52 Jan 06, 2023

CRNN With PyTorch

CRNN-PyTorch Implementation of https://arxiv.org/abs/1507.05717

4 Sep 01, 2022

Source code for paper "Deep Diffusion Models for Robust Channel Estimation", TBA.

diffusion-channels Source code for paper "Deep Diffusion Models for Robust Channel Estimation". Generic flow: Use 'matlab/main.mat' to generate traini

15 Dec 22, 2022

🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

SGLKT-VisDial Pytorch Implementation for the paper: Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer Gi-Cheon Kang, Junseok P

9 Jul 05, 2022

Improved Fitness Optimization Landscapes for Sequence Design

Related tags

Overview

ReLSO

Description

Citation

How to run

Usage

Training models

available dataset args:

available auxnetwork args:

Original data sources

You might also like...

An implementation of a sequence to sequence neural network using an encoder-decoder

Sequence lineage information extracted from RKI sequence data repo

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Aircraft design optimization made fast through modern automatic differentiation

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Puzzle-CAM: Improved localization via matching partial and full features.

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Comments

Conda env create not working

May the internal information of gifford data leads to a bias results given by model?

Releases(v1.0)

v1.0(Jul 31, 2022)

Owner

Krishnaswamy Lab

Fast and Context-Aware Framework for Space-Time Video Super-Resolution (VCIP 2021)

CenterNet:Objects as Points目标检测模型在Pytorch当中的实现

[IEEE TPAMI21] MobileSal: Extremely Efficient RGB-D Salient Object Detection [PyTorch & Jittor]

CRNN With PyTorch

Source code for paper "Deep Diffusion Models for Robust Channel Estimation", TBA.

🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

Hyperparameter tuning for humans

利用Tensorflow实现基于CNN的中文短文本分类

The code for Expectation-Maximization Attention Networks for Semantic Segmentation (ICCV'2019 Oral)

This is a clean and robust Pytorch implementation of DQN and Double DQN.

Object detection GUI based on PaddleDetection

A implemetation of the LRCN in mxnet

Cartoon-StyleGan2 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation

Instant neural graphics primitives: lightning fast NeRF and more

Invertible conditional GANs for image editing

SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Lava-DL, but with PyTorch-Lightning flavour

Official code for "EagerMOT: 3D Multi-Object Tracking via Sensor Fusion" [ICRA 2021]

Semantic Segmentation with Pytorch-Lightning