Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Last update: Nov 23, 2022

Related tags

Overview

pae_to_domains

Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Overview

Using a predicted aligned error matrix corresponding to an AlphaFold2 model (e.g. as downloaded from https://alphafold.ebi.ac.uk/), returns a series of lists of residue indices, where each list corresponds to a set of residues clustering together into a pseudo-rigid domain.

Requirements

Python >=3.7
NetworkX >= 2.6.2

Known Issues

Due to an internal implementation issue in NetworkX (Issue #4992) some combinations of PAE matrix and resolution can lead to a KeyError. Solutions to this are being explored, and it will hopefully be fixed in the next NetworkX release.

Usage

While primarily intended as a code snippet to be incorporated into larger projects, this can also be called from the command line. At its simplest:

python pae_to_domains.py pae_file.json

... will yield a .csv file with each line providing the indices for one residue cluster. Full help for the command-line version:

positional arguments:
  pae_file              Name of the PAE JSON file.

optional arguments:
  -h, --help            show this help message and exit
  --output_file OUTPUT_FILE
                        Name of output file (comma-delimited text format.
                        Default: clusters.csv
  --pae_power PAE_POWER
                        Graph edges will be weighted as 1/pae**pae_power.
                        Default: 1.0
  --pae_cutoff PAE_CUTOFF
                        Graph edges will only be created for residue pairs
                        with pae



Example
Using https://alphafold.ebi.ac.uk/entry/Q9HBA0 as an example case...
resolution=0.5: 
resolution=1.0: 
resolution=2.0:

Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Related tags

Overview

pae_to_domains

Overview

Requirements

Known Issues

Usage

Example

Owner

Tristan Croll

Bringing Characters to Life with Computer Brains in Unity

Boostcamp CV Serving For Python

Implementations of LSTM: A Search Space Odyssey variants and their training results on the PTB dataset.

NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

Sound Event Detection with FilterAugment

Keras-tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation（Unfinished）

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation (RA-L/ICRA 2020)

ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhin et al., 2020).

Logistic Bandit experiments. Official code for the paper "Jointly Efficient and Optimal Algorithms for Logistic Bandits".

[3DV 2021] Channel-Wise Attention-Based Network for Self-Supervised Monocular Depth Estimation

DCA - Official Python implementation of Delaunay Component Analysis algorithm

A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019).

Disagreement-Regularized Imitation Learning

PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Optimizing Value-at-Risk and Conditional Value-at-Risk of Black Box Functions with Lacing Values (LV)

NAACL'2021: Factual Probing Is [MASK]: Learning vs. Learning to Recall