A fast Protein Chain / Ligand Extractor and organizer.

Overview

mainicon

Are you tired of using visualization software, or full blown suites just to separate protein chains / ligands ? Are you tired of organizing the mess of molecules into separate folders ?

PDBaser does this for you !

What does it do ?

PDBaser reads raw .pdb and .ent files as downloaded from the pdb, extracts pure protein chains and heteroatoms (ligands and others) and removes water molecules, and then saves everything in a directory named as the original input filename.

Who is this for ?

This tool is perfect for RMSD reliability test preparation, where a large number of proteins and their ligands are needed. It can also help people who are not very accustomed to command line interfaces, and aren't willing to pay a (usually high) premium for other software.

Installation

Windows

For Windows users, PDBaser has a precompiled version, it can be found in the releases category, and can be installed on windows 7 SP1 / 8 / 8.1 / 10 and only requires Microsoft visual C++ 2015 x86.

Linux / MacOS and other Unix / Unix-like systems

There are 2 possible ways to run PDBaser in this case :

  1. Using Wine

    The quickest way to get PDBaser running on those systems is by using Wine (Tested on Wine 6.0.1, works only on a 64bit prefix for some reason),

    • 1 - Download and install the windows msi package and install it.
    • 2 - open a terminal window where you installed PDBaser and run wine PDBaser_GUI.exe.
  2. Building from source

    PDBaser is not OS dependant, and will probably run on any operating system provided the environment is correctly setup. However, since software distribution on Linux is a nightmare, and i do not have a mac system to package PDBaser for, you will have to either use Wine, or deal with setting up the environment from scratch.

    • 1 - First, you need a working python environment with support for Tkinter (i'm looking at you, Arch Linux)
    • 2 - Install BioPython and Pygubu from pip (pip install biopython / pip install pygubu)
    • 3 - You need to build openbabel 3.1.1 with depiction support (Cairo) and python bindings from source, and then install it from pip (pip install openbabel==3.1.1).

    If everything is setup correctly, running GUI/Build/PDBaser_GUI.py should work.

Features

  • Folder organization (Outputs are organized in a single folder named as the pdb file name).
  • Support for compressed pdb / ent files.
  • 2D Depiction and PNG/SVG output.
  • Outputs residues in most popular formats (pdb, sdf, mol2, smiles).
  • Multiple residue extraction at once is possible, chain only extraction with no residues is also possible.
  • Hydrogen generation for extracted residues is available (Except for SMILES format).
  • Support for downloading proteins from the PDB directly.

Screenshot

Limitations

  • No metadata extraction (Header, comments etc ...), only atom 3D poses with the molecule code in the PDB.
  • Only .pdb / .ent inputs and their compressed (.gz) form are supported, this is done by design as most proteins come only in pdb and ent formats, however residue outputs can have different formats (pdb, mol2, sdf, smiles).
  • there is a known bug where extracting a ligand in SMILES format does not generate a name for it, i'm gonna fix it as soon as i finish some work on my studies.

Downloads

For Windows x86/x64 : A binary setup is available in releases section.

For Unix/Unix-like(Linux/MacOS etc..) : Source is available in releases section, although i recommend installing the windows version and using it through Wine.

Citations

PDBaser relies on Biopython's BIO.PDB module, openbabel's pybel module and OASA.

BIO.PDB : Hamelryck T and Manderick B (2003) PDB file parser and structure class implemented in Python. Bioinformatics, 22, 2308-2310

openbabel's pybel : O'Boyle, N.M., Morley, C. & Hutchison, G.R. Pybel: a Python wrapper for the OpenBabel cheminformatics toolkit. Chemistry Central Journal 2, 5 (2008).

If this software helped you making a scientific publication, please cite it using the citation below :

M. A. Abdelaziz, “PDBaser, A python tool for fast protein - ligand extraction”, https://github.com/mimminou/PDBASER

Command line (Deprecated)

NOTE : CLI version is a very early release and is now DEPRECATED, and probably won't be supported anymore.

for this module to work, you need at least python 3.6.5 as well as Biopython.

from the date i'm writing this, i've been experiencing some issues regarding Biopython when running python 3.9, therefore i suggest users to download any iteration of python from 3.6.5 to 3.8.5 instead.

You can download and install python from the official website (3.6.5 recommended).

Biopython can be installed from pip.

Usage

Very straightforward, all you have to do is put this script in the folder containing the PDBs that need to be treated, run it from command line / terminal then follow instructions for each iteration.

There exists only 3 commands :

  • SKIP : command that skips the mentioned step.
  • Inserting data : normal usage.
  • Leaving blank field : will either default to chain A or extract all residues in the selected chain, depending on where the user left the input blank.
You might also like...
Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.
Unofficial TensorFlow implementation of Protein Interface Prediction using Graph Convolutional Networks.

[TensorFlow] Protein Interface Prediction using Graph Convolutional Networks Unofficial TensorFlow implementation of Protein Interface Prediction usin

A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms
A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms

Denoising Diffusion Probabilistic Model for Proteins Implementation of Denoising Diffusion Probabilistic Model in Pytorch. It is a new approach to gen

7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle

kaggle-hpa-2021-7th-place-solution Code for 7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle. A description of the met

Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix
Graph-based community clustering approach to extract protein domains from a predicted aligned error matrix

Using a predicted aligned error matrix corresponding to an AlphaFold2 model , returns a series of lists of residue indices, where each list corresponds to a set of residues clustering together into a pseudo-rigid domain.

Replication attempt for the Protein Folding Model

RGN2-Replica (WIP) To eventually become an unofficial working Pytorch implementation of RGN2, an state of the art model for MSA-less Protein Folding f

A geometric deep learning pipeline for predicting protein interface contacts.
A geometric deep learning pipeline for predicting protein interface contacts.

A geometric deep learning pipeline for predicting protein interface contacts.

A package to predict protein inter-residue geometries from sequence data

trRosetta This package is a part of trRosetta protein structure prediction protocol developed in: Improved protein structure prediction using predicte

A Protein-RNA Interface Predictor Based on Semantics of Sequences
A Protein-RNA Interface Predictor Based on Semantics of Sequences

PRIP PRIP:A Protein-RNA Interface Predictor Based on Semantics of Sequences installation gensim==3.8.3 matplotlib==3.1.3 xgboost==1.3.3 prettytable==2

Uni-Fold: Training your own deep protein-folding models

Uni-Fold: Training your own deep protein-folding models. This package provides an implementation of a trainable, Transformer-based deep protein foldin

Comments
  • PDBaser 2.0 Patch 1

    PDBaser 2.0 Patch 1

    • Added multiprocessing support, brings huge benefits to batch extraction.
    • Fixed an issue where PDBaser would crash if it tries to protonate a chain in a pdb file that isn't of peptidic nature ( exp : oligosaccharides ), PDBaser will now ignore the protonation step of these chains, but will still extract them properly.
    • Removed PMW, raises a lot of issues with python 3.10, replaced with a custom tk widget class ( Credits in the file ).
    • Added QUIET option to the PDB parser, now only propka and pdb2pqr generate noise ( i'll try to remove that as well )
    • removed some unnecessary code and optimized some function calls.
    opened by mimminou 0
  • Hovering over items results in an Error when using python 3.10

    Hovering over items results in an Error when using python 3.10

    I'm aware of this problem, It turns out that the PMW library I'm using to display tooltips has dependencies or legacy code that were not updated to support python 3.10. Will be fixed in the next patch.

    bug 
    opened by mimminou 0
Releases(2.0.1)
  • 2.0.1(May 23, 2022)

    • Added multiprocessing support, brings huge benefits to batch extraction.
    • Fixed an issue where PDBaser would crash if it tries to protonate a chain in a pdb file that isn't of peptidic nature ( exp : oligosaccharides ), PDBaser will now ignore the protonation step of these chains, but will still extract them properly.
    • Removed PMW, raises a lot of issues with python 3.10, replaced with a custom tk widget class ( Credits in the file ).
    • Added QUIET option to the PDB parser, now only propka and pdb2pqr generate noise ( i'll try to remove that as well )
    • removed some unnecessary code and optimized some function calls.
    Source code(tar.gz)
    Source code(zip)
    PDBaser_2.0_SETUP.exe(37.24 MB)
  • 2.0(May 17, 2022)

    PDBaser 2.0 🔥

    PDBaser is now 1 year old ! To celebrate PDBaser's first anniversary, new features have been implemented !

    • New powerful CLI interface added to bring automation and to support batch workloads.
    • PDBaser is now able to do titration states prediction and protonation for proteins (Using PROPKA through PDB2PQR).
    • Huge performance increases for batch workloads, with multiprocessing support coming in the upcoming patch.
    Source code(tar.gz)
    Source code(zip)
    PDBaser_2.0_SETUP.exe(37.47 MB)
  • 1.9(Mar 2, 2022)

    Optimizations :

    • Optimized some functions.
    • Updated to python 3.8.5.
    • Updated all dependencies.

    New Features :

    • Added an option to generate and extract binding site from select ligand position ( can vary between 1 and 10 Angstroms ).
    • Added an option to keep water molecules when extracting chains.
    • Hugely improved UI element placement on Linux, with some minor improvements on Windows.
    Source code(tar.gz)
    Source code(zip)
    PDBaser_1.9_SETUP.zip(35.10 MB)
  • 1.8(Oct 31, 2021)

    New PDBaser updated ❗ Changes :

    • Milestone Change : PDBaser is now Compiled with MSVC 14.0 instead of interpreted, Hugely thanks to the Nuitka Compiler, this will provide much quicker opening times and overrall better performance.
    • Switched from Openbabel depiction API to pure OASA (You can find it here https://gitlab.com/oasa/oasa, or install it from pip using pip install oasa3).
    • Added SVG output support, upscaled PNG output and removed the white default background.
    • Setup is now compiled with InnoSetup instead of Advanced installer, VCredist2015 is embedded in the setup, installation is optional.
    • Molecular weight of residues is now shown in the depiction.
    • New logo.

    Fixes :

    • Fixed interference issues when PDBaser was installed in a system where open babel 3.1.1 was installed and had BABEL_DIR in environment variables.
    • Fixed a huge memory leak when selecting residues.
    Source code(tar.gz)
    Source code(zip)
    PDBaser_Setup_1.8.exe(40.89 MB)
  • 1.6(Jun 9, 2021)

    The 1.6 release of PDBaser brings a few new features and some under the hood performance improvement, especially when dealing with very big molecules. new features are :

    • Downloading from the PDB directly is now possible.
    • Adding hydrogen to outputed residues has been added.
    • Added a new progress bar, can be usefull to track progress when downloading a large database from the PDB.
    • Names of outputed molecules has been fixed, especially for mol2 files (Fixed name generating as *****).
    • Some bug fixes and improvements, mainly UI side.
    Source code(tar.gz)
    Source code(zip)
    PDBaser_Win_x86_1.6.msi(26.62 MB)
  • 1.5(May 29, 2021)

  • 1.2(May 13, 2021)

  • v1.0(May 6, 2021)

Owner
Amine Abdz
the product of a biochemist discovering smart rocks that think with lightning.
Amine Abdz
Weakly Supervised Scene Text Detection using Deep Reinforcement Learning

Weakly Supervised Scene Text Detection using Deep Reinforcement Learning This repository contains the setup for all experiments performed in our Paper

Emanuel Metzenthin 3 Dec 16, 2022
Direct LiDAR Odometry: Fast Localization with Dense Point Clouds

Direct LiDAR Odometry: Fast Localization with Dense Point Clouds DLO is a lightweight and computationally-efficient frontend LiDAR odometry solution w

VECTR at UCLA 369 Dec 30, 2022
TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning

TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning Authors: Yixuan Su, Fangyu Liu, Zaiqiao Meng, Lei Shu, Ehsan Shareghi, and Nig

Yixuan Su 79 Nov 04, 2022
StyleGAN2 with adaptive discriminator augmentation (ADA) - Official TensorFlow implementation

StyleGAN2 with adaptive discriminator augmentation (ADA) — Official TensorFlow implementation Training Generative Adversarial Networks with Limited Da

NVIDIA Research Projects 1.7k Dec 29, 2022
Code and experiments for "Deep Neural Networks for Rank Consistent Ordinal Regression based on Conditional Probabilities"

corn-ordinal-neuralnet This repository contains the orginal model code and experiment logs for the paper "Deep Neural Networks for Rank Consistent Ord

Raschka Research Group 14 Dec 27, 2022
Riemannian Geometry for Molecular Surface Approximation (RGMolSA)

Riemannian Geometry for Molecular Surface Approximation (RGMolSA) Introduction Ligand-based virtual screening aims to reduce the cost and duration of

11 Nov 15, 2022
Multiple paper open-source codes of the Microsoft Research Asia DKI group

📫 Paper Code Collection (MSRA DKI Group) This repo hosts multiple open-source codes of the Microsoft Research Asia DKI Group. You could find the corr

Microsoft 249 Jan 08, 2023
Bachelor's Thesis in Computer Science: Privacy-Preserving Federated Learning Applied to Decentralized Data

federated is the source code for the Bachelor's Thesis Privacy-Preserving Federated Learning Applied to Decentralized Data (Spring 2021, NTNU) Federat

Dilawar Mahmood 25 Nov 30, 2022
Genshin-assets - 👧 Public documentation & static assets for Genshin Impact data.

genshin-assets This repo provides easy access to the Genshin Impact assets, primarily for use on static sites. Sources Genshin Optimizer - An Artifact

Zerite Development 5 Nov 22, 2022
Learning with Noisy Labels via Sparse Regularization, ICCV2021

Learning with Noisy Labels via Sparse Regularization This repository is the official implementation of [Learning with Noisy Labels via Sparse Regulari

Xiong Zhou 38 Oct 20, 2022
The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding"

AutoSF The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding" and this paper has been accepted by ICDE2020. News:

AutoML Research 64 Dec 17, 2022
CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022)

CMUA-Watermark The official code for CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022) arxiv. It is bas

50 Nov 26, 2022
A pytorch &keras implementation and demo of Fastformer.

Fastformer Notes from the authors Pytorch/Keras implementation of Fastformer. The keras version only includes the core fastformer attention part. The

153 Dec 28, 2022
Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21)

AdvRush Official Code for AdvRush: Searching for Adversarially Robust Neural Architectures (ICCV '21) Environmental Set-up Python == 3.6.12, PyTorch =

11 Dec 10, 2022
PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch.

snn-localization repo PyTorch implementation of Spiking Neural Networks trained on surrogate gradient & BPTT using snntorch. Install Dependencies Orig

Sami BARCHID 1 Jan 06, 2022
PyTorch implementation of deep GRAph Contrastive rEpresentation learning (GRACE).

GRACE The official PyTorch implementation of deep GRAph Contrastive rEpresentation learning (GRACE). For a thorough resource collection of self-superv

Big Data and Multi-modal Computing Group, CRIPAC 186 Dec 27, 2022
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

This is a release of our VIMPAC paper to illustrate the implementations. The pretrained checkpoints and scripts will be soon open-sourced in HuggingFace transformers.

Hao Tan 74 Dec 03, 2022
Official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR)

This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

12 Jan 13, 2022
Generate indoor scenes with Transformers

SceneFormer: Indoor Scene Generation with Transformers Initial code release for the Sceneformer paper, contains models, train and test scripts for the

Chandan Yeshwanth 110 Dec 06, 2022
Probabilistic Gradient Boosting Machines

PGBM Probabilistic Gradient Boosting Machines (PGBM) is a probabilistic gradient boosting framework in Python based on PyTorch/Numba, developed by Air

Olivier Sprangers 112 Dec 28, 2022