Implementation of ProteinBERT in Pytorch

Last update: Dec 25, 2022

Overview

ProteinBERT - Pytorch (wip)

Implementation of ProteinBERT in Pytorch.

Install

$ pip install protein-bert-pytorch

Usage

import torch
from protein_bert_pytorch import ProteinBERT

model = ProteinBERT(
    num_tokens = 21,
    num_annotation = 8943,
    dim = 512,
    dim_global = 256,
    depth = 6,
    narrow_conv_kernel = 9,
    wide_conv_kernel = 9,
    wide_conv_dilation = 5,
    attn_heads = 8,
    attn_dim_head = 64
)

seq = torch.randint(0, 21, (2, 2048))
mask = torch.ones(2, 2048).bool()
annotation = torch.randint(0, 1, (2, 8943)).float()

seq_logits, annotation_logits = model(seq, annotation, mask = mask) # (2, 2048, 21), (2, 8943)

Citations

@article {Brandes2021.05.24.445464,
    author      = {Brandes, Nadav and Ofer, Dan and Peleg, Yam and Rappoport, Nadav and Linial, Michal},
    title       = {ProteinBERT: A universal deep-learning model of protein sequence and function},
    year        = {2021},
    doi         = {10.1101/2021.05.24.445464},
    publisher   = {Cold Spring Harbor Laboratory},
    URL         = {https://www.biorxiv.org/content/early/2021/05/25/2021.05.24.445464},
    eprint      = {https://www.biorxiv.org/content/early/2021/05/25/2021.05.24.445464.full.pdf},
    journal     = {bioRxiv}
}

You might also like...

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

43 Dec 28, 2022

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

StyleSpeech - PyTorch Implementation PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. Status (2021.06.09

142 Jan 6, 2023

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Cross-Covariance Image Transformer (XCiT) PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer L

605 Jan 2, 2023

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

RE2 This is a pytorch implementation of the ACL 2019 paper "Simple and Effective Text Matching with Richer Alignment Features". The original Tensorflo

286 Jan 2, 2023

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

67 Nov 14, 2022

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Splitter ⠀⠀ A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019). Abstract Recent inte

201 Nov 9, 2022

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre

775 Jan 8, 2023

PyTorch original implementation of Cross-lingual Language Model Pretraining.

XLM NEW: Added XLM-R model. PyTorch original implementation of Cross-lingual Language Model Pretraining. Includes: Monolingual language model pretrain

2.7k Dec 27, 2022

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

204 Jul 14, 2022

Comments

bugFix: x and y not on the same device when Learner is trained on GPU

When

seq        = torch.randint(0, 21, (2, 2048)).cuda()
annotation = torch.randint(0, 1, (2, 8943)).float().cuda()
mask       = torch.ones(2, 2048).bool().cuda()

learner.cuda()

loss = learner(seq, annotation, mask = mask) # (2, 2048, 21), (2, 8943)

OUTPUT

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-60892e498570> in <module>
      4 learner.cuda()
      5 
----> 6 loss = learner(seq, annotation, mask = mask) # (2, 2048, 21), (2, 8943)

~/data/.conda/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

/mnt/5280b/wwang/proteinbert/protein_bert_pytorch.py in forward(self, seq, annotation, mask)
    365 
    366         for token_id in self.exclude_token_ids:
--> 367             random_replace_token_prob_mask = random_replace_token_prob_mask & (random_tokens != token_id)  # make sure you never substitute a token with an excluded token type (pad, start, end)
    368 
    369         # noise sequence

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

opened by wilmerwang 0

How to use this bert version to use the pretrianed model?

Hi guys, thanks for great work. I'm trying to use this pytorch version protein-bert to use the pre-trained model 'ftp://ftp.cs.huji.ac.il/users/nadavb/protein_bert/epoch_92400_sample_23500000.pkl', but have no clues at all. Could you please give some suggestions? Thank you so much!

opened by Y-H-Joe 1

Implementation of ProteinBERT in Pytorch

Related tags

Overview

ProteinBERT - Pytorch (wip)

Install

Usage

Citations

You might also like...

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

PyTorch original implementation of Cross-lingual Language Model Pretraining.

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Comments

bugFix: x and y not on the same device when Learner is trained on GPU

How to use this bert version to use the pretrianed model?

Releases(0.1.0)

0.1.0(Aug 10, 2021)

0.0.11(Aug 6, 2021)

0.0.10(Jun 11, 2021)

0.0.9(Jun 11, 2021)

0.0.8(Jun 11, 2021)

0.0.7(Jun 10, 2021)

0.0.6(May 29, 2021)

0.0.5(May 28, 2021)

0.0.4(May 28, 2021)

0.0.3a(May 28, 2021)

0.0.2(May 28, 2021)

0.0.1(May 28, 2021)

Owner

Phil Wang

Official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

Control the classic General Instrument SP0256-AL2 speech chip and AY-3-8910 sound generator with a Raspberry Pi and this Python library.

A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

This is Assignment1 code for the Web Data Processing System.

A Paper List for Speech Translation

The source code of HeCo

A list of NLP(Natural Language Processing) tutorials

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Paradigm Shift in NLP - "Paradigm Shift in Natural Language Processing".

SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

Subtitle Workshop (subshop): tools to download and synchronize subtitles

Text Normalization（文本正则化）

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

Score-Based Point Cloud Denoising (ICCV'21)

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

🏆 • 5050 most frequent words in 109 languages

Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS)

Unsupervised intent recognition

A collection of Korean Text Datasets ready to use using Tensorflow-Datasets.

A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.