CLASP - Contrastive Language-Aminoacid Sequence Pretraining

Last update: Dec 29, 2022

Related tags

Deep Learning clasp

Overview

CLASP - Contrastive Language-Aminoacid Sequence Pretraining

Repository for creating models pretrained on language and aminoacid sequences similar to ConVIRT, CLIP, and ALIGN.

Work in progress - more updates soon!

Requirements

You can install the requirements with the following

$ python setup.py install --user

Then, you must install Microsoft's sparse attention CUDA kernel with the following two steps.

$ sh install_deepspeed.sh

Next, you need to pip install the package triton

$ pip install triton

If both of the above succeeded, now you can train your long biosequences with CLASP

Usage

import torch
from torch.optim import Adam

from clasp import CLASP, Transformer, tokenize

# instantiate the attention models for text and bioseq

text_enc = Transformer(
    num_tokens = 20000,
    dim = 512,
    depth = 6,
    seq_len = 1024
)

bioseq_enc = Transformer(
    num_tokens = 21,
    dim = 512,
    depth = 6,
    seq_len = 512,
    sparse_attn = True
)

# clasp (CLIP) trainer

clasp = CLASP(
    text_encoder = text_enc,
    bioseq_encoder = bioseq_enc
)

# data

text, text_mask = tokenize(['Spike protein S2: HAMAP-Rule:MF_04099'], context_length = 1024, return_mask = True)

bioseq = torch.randint(0, 21, (1, 511))         # when using sparse attention, should be 1 less than the sequence length
bioseq_mask = torch.ones_like(bioseq).bool()

# do the below with large batch sizes for many many iterations

opt = Adam(clasp.parameters(), lr = 3e-4)

loss = clasp(
    text,
    bioseq,
    text_mask = text_mask,
    bioseq_mask = bioseq_mask,
    return_loss = True               # set return loss to True
)

loss.backward()

Once trained

scores = clasp(
    texts,
    bio_seq,
    text_mask = text_mask,
    bioseq_mask = bioseq_mask
)

Resources

See interesting resources (feel free to add interesting material that could be useful).

Citations

@article{zhang2020contrastive,
  title={Contrastive learning of medical visual representations from paired images and text},
  author={Zhang, Yuhao and Jiang, Hang and Miura, Yasuhide and Manning, Christopher D and Langlotz, Curtis P},
  journal={arXiv preprint arXiv:2010.00747},
  year={2020}
}

OpenAI blog post "CLIP: Connecting Text and Images"

@article{radford2021learning,
  title={Learning transferable visual models from natural language supervision},
  author={Radford, Alec and Kim, Jong Wook and Hallacy, Chris and Ramesh, Aditya and Goh, Gabriel and Agarwal, Sandhini and Sastry, Girish and Askell, Amanda and Mishkin, Pamela and Clark, Jack and others},
  journal={arXiv preprint arXiv:2103.00020},
  year={2021}
}

@article{jia2021scaling,
  title={Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision},
  author={Jia, Chao and Yang, Yinfei and Xia, Ye and Chen, Yi-Ting and Parekh, Zarana and Pham, Hieu and Le, Quoc V and Sung, Yunhsuan and Li, Zhen and Duerig, Tom},
  journal={arXiv preprint arXiv:2102.05918},
  year={2021}
}

CLASP - Contrastive Language-Aminoacid Sequence Pretraining

Related tags

Overview

CLASP - Contrastive Language-Aminoacid Sequence Pretraining

Requirements

Usage

Resources

Citations

Owner

Michael Pieler

A platform to display the carbon neutralization information for researchers, decision-makers, and other participants in the community.

Minimalist Error collection Service compatible with Rollbar clients. Sentry or Rollbar alternative.

Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

Lightweight stereo matching network based on MobileNetV1 and MobileNetV2

Problem-943.-ACMP - Problem 943. ACMP

Pytorch implementation of the paper "Optimization as a Model for Few-Shot Learning"

CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

RIFE - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

Attention for PyTorch with Linear Memory Footprint

NeurIPS workshop paper 'Counter-Strike Deathmatch with Large-Scale Behavioural Cloning'

Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Face-Recognition-based-Attendance-System - An implementation of Attendance System in python.

Automates Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning :rocket:

Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

Implementation of SiameseXML (ICML 2021)

A tool to visualise the results of AlphaFold2 and inspect the quality of structural predictions

Temporal Segment Networks (TSN) in PyTorch

Publication describing 3 ML examples at NSLS-II and interfacing into Bluesky

An implementation demo of the ICLR 2021 paper Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks in PyTorch.

Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations