Implementation of ProteinBERT in Pytorch

Last update: Dec 25, 2022

Overview

ProteinBERT - Pytorch (wip)

Implementation of ProteinBERT in Pytorch.

Install

$ pip install protein-bert-pytorch

Usage

import torch
from protein_bert_pytorch import ProteinBERT

model = ProteinBERT(
    num_tokens = 21,
    num_annotation = 8943,
    dim = 512,
    dim_global = 256,
    depth = 6,
    narrow_conv_kernel = 9,
    wide_conv_kernel = 9,
    wide_conv_dilation = 5,
    attn_heads = 8,
    attn_dim_head = 64
)

seq = torch.randint(0, 21, (2, 2048))
mask = torch.ones(2, 2048).bool()
annotation = torch.randint(0, 1, (2, 8943)).float()

seq_logits, annotation_logits = model(seq, annotation, mask = mask) # (2, 2048, 21), (2, 8943)

Citations

@article {Brandes2021.05.24.445464,
    author      = {Brandes, Nadav and Ofer, Dan and Peleg, Yam and Rappoport, Nadav and Linial, Michal},
    title       = {ProteinBERT: A universal deep-learning model of protein sequence and function},
    year        = {2021},
    doi         = {10.1101/2021.05.24.445464},
    publisher   = {Cold Spring Harbor Laboratory},
    URL         = {https://www.biorxiv.org/content/early/2021/05/25/2021.05.24.445464},
    eprint      = {https://www.biorxiv.org/content/early/2021/05/25/2021.05.24.445464.full.pdf},
    journal     = {bioRxiv}
}

You might also like...

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

43 Dec 28, 2022

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

StyleSpeech - PyTorch Implementation PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. Status (2021.06.09

142 Jan 6, 2023

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

Cross-Covariance Image Transformer (XCiT) PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer L

605 Jan 2, 2023

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

RE2 This is a pytorch implementation of the ACL 2019 paper "Simple and Effective Text Matching with Richer Alignment Features". The original Tensorflo

286 Jan 2, 2023

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

67 Nov 14, 2022

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Splitter ⠀⠀ A PyTorch implementation of Splitter: Learning Node Representations that Capture Multiple Social Contexts (WWW 2019). Abstract Recent inte

201 Nov 9, 2022

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to pre

775 Jan 8, 2023

PyTorch original implementation of Cross-lingual Language Model Pretraining.

XLM NEW: Added XLM-R model. PyTorch original implementation of Cross-lingual Language Model Pretraining. Includes: Monolingual language model pretrain

2.7k Dec 27, 2022

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

204 Jul 14, 2022

Comments

bugFix: x and y not on the same device when Learner is trained on GPU

When

seq        = torch.randint(0, 21, (2, 2048)).cuda()
annotation = torch.randint(0, 1, (2, 8943)).float().cuda()
mask       = torch.ones(2, 2048).bool().cuda()

learner.cuda()

loss = learner(seq, annotation, mask = mask) # (2, 2048, 21), (2, 8943)

OUTPUT

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-60892e498570> in <module>
      4 learner.cuda()
      5 
----> 6 loss = learner(seq, annotation, mask = mask) # (2, 2048, 21), (2, 8943)

~/data/.conda/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

/mnt/5280b/wwang/proteinbert/protein_bert_pytorch.py in forward(self, seq, annotation, mask)
    365 
    366         for token_id in self.exclude_token_ids:
--> 367             random_replace_token_prob_mask = random_replace_token_prob_mask & (random_tokens != token_id)  # make sure you never substitute a token with an excluded token type (pad, start, end)
    368 
    369         # noise sequence

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

opened by wilmerwang 0

How to use this bert version to use the pretrianed model?

Hi guys, thanks for great work. I'm trying to use this pytorch version protein-bert to use the pre-trained model 'ftp://ftp.cs.huji.ac.il/users/nadavb/protein_bert/epoch_92400_sample_23500000.pkl', but have no clues at all. Could you please give some suggestions? Thank you so much!

opened by Y-H-Joe 1

Implementation of ProteinBERT in Pytorch

Related tags

Overview

ProteinBERT - Pytorch (wip)

Install

Usage

Citations

You might also like...

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

PyTorch original implementation of Cross-lingual Language Model Pretraining.

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Comments

bugFix: x and y not on the same device when Learner is trained on GPU

How to use this bert version to use the pretrianed model?

Releases(0.1.0)

0.1.0(Aug 10, 2021)

0.0.11(Aug 6, 2021)

0.0.10(Jun 11, 2021)

0.0.9(Jun 11, 2021)

0.0.8(Jun 11, 2021)

0.0.7(Jun 10, 2021)

0.0.6(May 29, 2021)

0.0.5(May 28, 2021)

0.0.4(May 28, 2021)

0.0.3a(May 28, 2021)

0.0.2(May 28, 2021)

0.0.1(May 28, 2021)

Owner

Phil Wang

ASCEND Chinese-English code-switching dataset

🦆 Contextually-keyed word vectors

End-to-End Speech Processing Toolkit

AI_Assistant - This is a Python based Voice Assistant.

Simple Text-To-Speech Bot For Discord

Google and Stanford University released a new pre-trained model called ELECTRA

基于百度的语音识别，用python实现，pyaudio+pyqt

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

CCKS-Title-based-large-scale-commodity-entity-retrieval-top1

Implementation of TTS with combination of Tacotron2 and HiFi-GAN

Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Help you discover excellent English projects and get rid of disturbing by other spoken language

Unet-TTS: Improving Unseen Speaker and Style Transfer in One-shot Voice Cloning

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

A curated list of FOSS tools to improve the Hacker News experience

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Quantifiers and Negations in RE Documents

A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.

A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion

CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training