A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

Last update: Dec 22, 2022

Related tags

Overview

ParallelFold

This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model) of Alphafold2 local version.

How to install

First you should install Alphafold2. You can choose one of the following methods to install Alphafold locally.

Use official version from DeepMind with docker.
There are some other versions install Alphafold without docker.
Also you can use my guide which based on non_docker version and it can adjust to different cuda versions (cuda driver >= 10.1)

Then, put these 4 files in your Alphafold folder, this folder should have an original run_alphafold.py file, and I use a run_alphafold.sh file to run Alphafold easily (learned from non_docker version)

4 files:

run_alphafold.py: modified version of original run_alphafold.py, it skips featuring steps when there exists feature.pkl in output folder
run_alphaold.sh: bash script to run run_alphafold.py
run_feature.py: modified version of original run_alphafold.py, it exit python process after finished writing feature.pkl
run_feature.sh: bash scripts to run run_feature.py

How to run

First, you need CPUs to run run_feature.sh:

./run_feature.sh -d data -o output -m model_1 -f input/test3.fasta -t 2021-07-27

8 CPUs is enough, according to my test, more CPUs won't help with speed.

GPU can accelerate the hhblits step (but I think you choose this repo because GPU is expensive)

Featuring step will output the feature.pkl and MSA folder in your output folder: ./output/JOBNAME/

PS: Here I put my input files in an input folder to better organize my files, you can remove this.

Second, you can run run_alphafold.sh using GPU:

./run_alphafold.sh -d data -o output -m model_1,model_2,model_3,model_4,model_5 -f input/test.fasta -t 2021-07-27

If you have successfully output feature.pkl, you can have a very fast featuring step

I have also upload my scripts in SJTU HPC (using slurm): sub_alphafold.slurm and sub_feature.slurm

Other Files

In ./Alphafold folder, I modified some python files (hhblits.py, hmmsearch.py, jackhmmer.py) , give these steps more CPUs for acceleration. But these processes have been tested and shown to be unable to accelerate by providing more CPU. Maybe this is because

Probably because DeepMind uses a wrapped process, I'm trying to improve it (work in progress).

If you have any question, please send your problem in issues

Comments

运行脚本后，还是有问题。

博士好！我发现我运行脚本后，cpu部分是可以正常运行了，但是GPU部分不管短序列（200+aa）还是长序列（1800+aa），都会报错，我的脚本如下： #!/bin/bash module load anaconda/2020.11 source activate /data/home/zhoujy/run/alphafold2 ./run_feature.sh -d /data/public/alphafold2 -o /data/home/zhoujy/run/output -m model_1 -f /data/home/zhoujy/run/input/Q9NYP9.fasta -t 2021-07-27 ./run_alphafold.sh -d /data/public/alphafold2 -o /data/home/zhoujy/run/output -m model_1,model_2,model_3,model_4,model_5 -f /data/home/zhoujy/run/input/Q9NYP9.fasta -t 2021-07-27

用了1张GPU卡提交的。

报错内容如下：

87 I0927 17:05:14.162350 139818804778816 xla_bridge.py:226] Unable to initialize backend 'tpu': Invalid argument: TpuPlatform is not available. 88 I0927 17:05:23.883118 139818804778816 run_alphafold.py:272] Have 5 models: ['model_1', 'model_2', 'model_3', 'model_4', 'model_5'] 89 I0927 17:05:23.883379 139818804778816 run_alphafold.py:285] Using random seed 491376288278862761 for the data pipeline 90 I0927 17:05:23.892619 139818804778816 run_alphafold.py:151] Running model model_1 91 I0927 17:05:34.480318 139818804778816 model.py:131] Running predict with shape(feat) = {'aatype': (4, 233), 'residue_index': (4, 233), 'seq_length': (4,) , 'template_aatype': (4, 4, 233), 'template_all_atom_masks': (4, 4, 233, 37), 'template_all_atom_positions': (4, 4, 233, 37, 3), 'template_sum_probs': (4 , 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 233), 'msa_mask': (4, 508, 233), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'templat e_mask': (4, 4), 'template_pseudo_beta': (4, 4, 233, 3), 'template_pseudo_beta_mask': (4, 4, 233), 'atom14_atom_exists': (4, 233, 14), 'residx_atom14_to_ atom37': (4, 233, 14), 'residx_atom37_to_atom14': (4, 233, 37), 'atom37_atom_exists': (4, 233, 37), 'extra_msa': (4, 5120, 233), 'extra_msa_mask': (4, 51 20, 233), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 233), 'true_msa': (4, 508, 233), 'extra_has_deletion': (4, 5120, 233), 'extra_deletion_v alue': (4, 5120, 233), 'msa_feat': (4, 508, 233, 49), 'target_feat': (4, 233, 22)} 92 2021-09-27 17:05:35.143686: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:81] Couldn't get ptxas version string: Internal: Run ning ptxas --version returned 32512 93 2021-09-27 17:05:35.324896: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilati on of ptx to sass: 'Internal: ptxas exited with non-zero error code 32512, output: ' If the error message indicates that a file could not be written, pl ease verify that sufficient filesystem space is provided. 94 Fatal Python error: Aborted 95 96 Thread 0x00007f2a1a311740 (most recent call first): 97 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/interpreters/xla.py", line 387 in backend_compile 98 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/interpreters/xla.py", line 324 in xla_primitive_callable 99 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/util.py", line 188 in cached 100 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/util.py", line 195 in wrapper 101 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/interpreters/xla.py", line 275 in apply_primitive 102 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/core.py", line 612 in process_primitive 103 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/core.py", line 267 in bind 104 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 388 in shift_right_logical 105 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/prng.py", line 229 in threefry_seed 106 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/prng.py", line 191 in seed_with_impl 107 File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/_src/random.py", line 105 in PRNGKey 108 File "/data/run01/zhoujy/ParallelFold-main/alphafold/model/model.py", line 133 in predict 109 File "/data/run01/zhoujy/ParallelFold-main/run_alphafold.py", line 158 in predict_structure 110 File "/data/run01/zhoujy/ParallelFold-main/run_alphafold.py", line 289 in main 111 File "/data/home/zhoujy/.local/lib/python3.8/site-packages/absl/app.py", line 258 in _run_main 112 File "/data/home/zhoujy/.local/lib/python3.8/site-packages/absl/app.py", line 312 in run 113 File "/data/run01/zhoujy/ParallelFold-main/run_alphafold.py", line 316 in

从92-113行，不论序列长短都会出现这种报错。这是什么原因引起的呢？ @Zuricho

opened by zhoujingyu13687306871 16
Where Can I find The Protein sequence？

After Reading the Article, AlphaFold Deployment and Optimization on HPC Platform， I want make some experiments according to the arctile, But I cannot find the Protein sequence online. Can you tell me the way to downloading the fasta file in the article?

opened by yanchenmochen 4
How to run GPU part?

How do I run model inference on GPU part of the process after featurization step? Does the model inference step automatically find feature.pkl in some folder?

opened by hrzolix 4
How to accelerate the HHBLITS step with GPU

Halo! Thanks for your good job! I have some question about this job:

Q1: Do you Know how to accelerate the HHBLITS step with GPU？

Q2: I use --cpu 8 to run jackhmmer but alway just use 2 cpu and I dont know why

opened by Licko0909 4
2022-01-11 09:19:03.536275: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code 32512, output: ' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided. Fatal Python error: Aborted

2022-01-11 09:19:02.638037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 28422 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:41:00.0, compute capability: 7.0) I0111 09:19:03.171788 47078973446272 model.py:165] Running predict with shape(feat) = {'aatype': (4, 45), 'residue_index': (4, 45), 'seq_length': (4,), 'template_aatype': (4, 4, 45), 'template_all_atom_masks': (4, 4, 45, 37), 'template_all_atom_positions': (4, 4, 45, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 45), 'msa_mask': (4, 508, 45), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 45, 3), 'template_pseudo_beta_mask': (4, 4, 45), 'atom14_atom_exists': (4, 45, 14), 'residx_atom14_to_atom37': (4, 45, 14), 'residx_atom37_to_atom14': (4, 45, 37), 'atom37_atom_exists': (4, 45, 37), 'extra_msa': (4, 5120, 45), 'extra_msa_mask': (4, 5120, 45), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 45), 'true_msa': (4, 508, 45), 'extra_has_deletion': (4, 5120, 45), 'extra_deletion_value': (4, 5120, 45), 'msa_feat': (4, 508, 45, 49), 'target_feat': (4, 45, 22)} 2022-01-11 09:19:03.503247: W external/org_tensorflow/tensorflow/stream_executor/gpu/asm_compiler.cc:81] Couldn't get ptxas version string: Internal: Running ptxas --version returned 32512 2022-01-11 09:19:03.536275: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:479] ptxas returned an error during compilation of ptx to sass: 'Internal: ptxas exited with non-zero error code 32512, output: ' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided. Fatal Python error: Aborted

Thread 0x00002ad16d7d1880 (most recent call first): File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 360 in backend_compile File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 297 in xla_primitive_callable File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/_src/util.py", line 179 in cached File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/_src/util.py", line 186 in wrapper File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 248 in apply_primitive File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 603 in process_primitive File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 264 in bind File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 382 in shift_right_logical File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/jax/_src/random.py", line 75 in PRNGKey File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/alphafold/model/model.py", line 167 in predict File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/run_alphafold.py", line 210 in predict_structure File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/run_alphafold.py", line 429 in main File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258 in _run_main File "/public/software/.local/easybuild/software/Anaconda3/2020.02/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312 in run File "/public/software/.local/easybuild/software/ParallelFold/ParallelFold/run_alphafold.py", line 455 in ./run_alphafold.sh: line 233: 7015 Aborted python $alphafold_script --fasta_paths=$fasta_path --model_names=$model_selection --data_dir=$data_dir --output_dir=$output_dir --jackhmmer_binary_path=$jackhmmer_binary_path --hhblits_binary_path=$hhblits_binary_path --hhsearch_binary_path=$hhsearch_binary_path --hmmsearch_binary_path=$hmmsearch_binary_path --hmmbuild_binary_path=$hmmbuild_binary_path --kalign_binary_path=$kalign_binary_path --uniref90_database_path=$uniref90_database_path --mgnify_database_path=$mgnify_database_path --bfd_database_path=$bfd_database_path --small_bfd_database_path=$small_bfd_database_path --uniclust30_database_path=$uniclust30_database_path --uniprot_database_path=$uniprot_database_path --pdb70_database_path=$pdb70_database_path --pdb_seqres_database_path=$pdb_seqres_database_path --template_mmcif_dir=$template_mmcif_dir --max_template_date=$max_template_date --obsolete_pdbs_path=$obsolete_pdbs_path --db_preset=$db_preset --model_preset=$model_preset --benchmark=$benchmark --amber_relaxation=$amber_relaxation --recycling=$recycling --run_feature=$run_feature --logtostderr

opened by chenshixinnb 3
ValueError: jaxlib is version 0.1.69, but this version of jax requires version 0.1.74.

根据您的步骤安装conda环境，在conda环境中执行：import jax; print(jax.devices()) 报错：ValueError: jaxlib is version 0.1.69, but this version of jax requires version 0.1.74. 请问如何解决呢，谢谢！

opened by chenshixinnb 3
somthing wrong occured when I run the job

hi,dear author , I installed the required modules according to the link requirements, but the following error occurred when I was running the script. Can you help me find out what is causing it? My installation steps are as follows: 1、conda create --prefix=/data/home/zhoujy/run/alphafold2 python=3.8 2、conda activate /data/home/zhoujy/run/alphafold2 3、conda install cudatoolkit=10.1 cudnn 4、pip install tensorflow==2.3.0 5、pip install biopython==1.79 chex==0.0.7 dm-haiku==0.0.4 dm-tree==0.1.6 immutabledict==2.0.0 jax==0.2.14 ml-collections==0.1.0 6、pip install --upgrade jax jaxlib==0.1.69+cuda101 -f https://storage.googleapis.com/jax-releases/jax_releases.html

and then , I run the script:

#!/bin/bash module load anaconda/2020.11 source activate /data/home/zhoujy/run/alphafold2 ./run_feature.sh -d /data/public/alphafold2 -o /data/home/zhoujy/run/output -m model_1 -f /data/home/zhoujy/run/input/Tb927.10.2950.fasta -t 2021-07-27

result show as follows: Traceback (most recent call last): File "/data/run01/zhoujy/ParallelFold-main/run_feature.py", line 33, in from alphafold.model import data File "/data/run01/zhoujy/ParallelFold-main/alphafold/model/data.py", line 20, in from alphafold.model import utils File "/data/run01/zhoujy/ParallelFold-main/alphafold/model/utils.py", line 21, in import haiku as hk File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/haiku/init.py", line 17, in from haiku import data_structures File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/haiku/data_structures.py", line 17, in from haiku._src.data_structures import to_immutable_dict File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/haiku/_src/data_structures.py", line 30, in from haiku._src import utils File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/haiku/_src/utils.py", line 24, in import jax File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/init.py", line 16, in from .api import ( File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/api.py", line 38, in from . import core File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/core.py", line 31, in from . import dtypes File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/dtypes.py", line 31, in from .lib import xla_client File "/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jax/lib/init.py", line 51, in from jaxlib import pytree ImportError: cannot import name 'pytree' from 'jaxlib' (/data/home/zhoujy/run/alphafold2/lib/python3.8/site-packages/jaxlib/init.py)

why ? I need you help

opened by zhoujingyu13687306871 2
Limit RAM usage

Im trying to run a fasta file with 3643 in length. MSA part was done, but the inference part tried to allocate 80 GB of VRAM on GPU which I dont have access to, Graphic cards are NVIDIA Tesla V100 16 GB. Now im trying to run inference on CPU which is a very slow process, and the job keeps using a lot of RAM and expand the usage as the time passes. Can I limit usage of RAM somehow? Or can I run inference on more graphic cards maybe with parallel process?

opened by hrzolix 1
GPU利用率问题

博士好！我昨天进行多次尝试后，现在可以运行了，但是我发现运行run_alphafold.sh脚本的时候，涉及GPU计算部分，在相当长的一段时间处于CPU运行状态，GPU利用率长时间为0，我尝试计算一条序列长为2000的蛋白质，用了4个V100的卡，计算了9天，这个速度和情况这个是否正常呢？另外前面在安装tensorflow阶段，是否有必要安装GPU版的tensorflow呢？

@Zuricho

opened by zhoujingyu13687306871 1
Error after GPU part

Hi, after installation the "CPU part" (jackhammer and hhblits) work well. But when i start the gpu part, i've got this error message: TypeError: take requires ndarray or scalar arguments, got <class 'list'> at position 0.

1st part: ./run_feature.sh -d data -o ./tmp -m model_1,model_2,model_3,model_4,model_5 -f ./query/1crn.fasta -t 2021-07-27 2st part: ./run_alphafold.sh -d data -o ./tmp -m model_1,model_2,model_3,model_4,model_5 -f ./query/1crn.fasta -t 2021-07-27

Full error message: File "/softwares/alphafold/run_alphafold.py", line 316, in app.run(main) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/softwares/alphafold/run_alphafold.py", line 289, in main predict_structure( File "/softwares/alphafold/run_alphafold.py", line 188, in predict_structure relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein) File "/softwares/alphafold/alphafold/relax/relax.py", line 58, in process out = amber_minimize.run_pipeline( File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 482, in run_pipeline ret.update(get_violation_metrics(prot)) File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 356, in get_violation_metrics structural_violations, struct_metrics = find_violations(prot) File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 338, in find_violations violations = folding.find_structural_violations( File "/softwares/alphafold/alphafold/model/folding.py", line 757, in find_structural_violations atom14_atom_radius = batch['atom14_atom_exists'] * utils.batched_gather( File "/softwares/alphafold/alphafold/model/utils.py", line 39, in batched_gather return take_fn(params, indices) File "/softwares/alphafold/alphafold/model/utils.py", line 36, in take_fn = lambda p, i: jnp.take(p, i, axis=axis) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 5383, in take return _take(a, indices, None if axis is None else operator.index(axis), out, File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/traceback_util.py", line 162, in reraise_with_filtered_traceback return fun(*args, **kwargs) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/api.py", line 411, in cache_miss out_flat = xla.xla_call( File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1618, in bind return call_bind(self, fun, *args, **params) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1609, in call_bind outs = primitive.process(top_trace, fun, tracers, params) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 1621, in process return trace.process_call(self, fun, tracers, params) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/core.py", line 615, in process_call return primitive.impl(f, *tracers, **params) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 622, in _xla_call_impl compiled_fun = _xla_callable(fun, device, backend, name, donated_invars, File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/linear_util.py", line 262, in memoized_fun ans = call(fun, *args) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 694, in _xla_callable return lower_xla_callable(fun, device, backend, name, donated_invars, *arg_specs).compile().unsafe_call File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/xla.py", line 702, in lower_xla_callable jaxpr, out_avals, consts = pe.trace_to_jaxpr_final( File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1522, in trace_to_jaxpr_final jaxpr, out_avals, consts = trace_to_subjaxpr_dynamic(fun, main, in_avals) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/interpreters/partial_eval.py", line 1500, in trace_to_subjaxpr_dynamic ans = fun.call_wrapped(*in_tracers) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/linear_util.py", line 166, in call_wrapped ans = self.f(*args, **dict(self.params, **kwargs)) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 5390, in _take _check_arraylike("take", a) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 559, in _check_arraylike raise TypeError(msg.format(fun_name, type(arg), pos)) jax._src.traceback_util.UnfilteredStackTrace: TypeError: take requires ndarray or scalar arguments, got <class 'list'> at position 0.

The stack trace below excludes JAX-internal frames. The preceding is the original exception that occurred, unmodified.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/softwares/alphafold/run_alphafold.py", line 316, in app.run(main) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 312, in run _run_main(main, args) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main sys.exit(main(argv)) File "/softwares/alphafold/run_alphafold.py", line 289, in main predict_structure( File "/softwares/alphafold/run_alphafold.py", line 188, in predict_structure relaxed_pdb_str, _, _ = amber_relaxer.process(prot=unrelaxed_protein) File "/softwares/alphafold/alphafold/relax/relax.py", line 58, in process out = amber_minimize.run_pipeline( File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 482, in run_pipeline ret.update(get_violation_metrics(prot)) File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 356, in get_violation_metrics structural_violations, struct_metrics = find_violations(prot) File "/softwares/alphafold/alphafold/relax/amber_minimize.py", line 338, in find_violations violations = folding.find_structural_violations( File "/softwares/alphafold/alphafold/model/folding.py", line 757, in find_structural_violations atom14_atom_radius = batch['atom14_atom_exists'] * utils.batched_gather( File "/softwares/alphafold/alphafold/model/utils.py", line 39, in batched_gather return take_fn(params, indices) File "/softwares/alphafold/alphafold/model/utils.py", line 36, in take_fn = lambda p, i: jnp.take(p, i, axis=axis) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 5383, in take return _take(a, indices, None if axis is None else operator.index(axis), out, File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 5390, in _take _check_arraylike("take", a) File "/softwares/alphafold/envs/alphafold/lib/python3.8/site-packages/jax/_src/numpy/lax_numpy.py", line 559, in _check_arraylike raise TypeError(msg.format(fun_name, type(arg), pos)) TypeError: take requires ndarray or scalar arguments, got <class 'list'> at position 0.

opened by ebettler 1
Running ParallelFold on reduced database?

Is it possible to run ParallelFold on reduced_dbs, or is it not yet supported? I tried to use -c reduced_dbs but it did not work. Then I tried modifying the bfd_path set in run_alphafold.sh, somehow it threw directory/file cannot found error. (I'm pretty sure it's there bc I'm able to run alphafold using it). Thank you for your help in advance!

opened by xinyu-g 0
Is CPU acceleration failed?

Last day, I make some experiments in a Server to run the ./run_alphafold.sh -d /dataset/ -o result -p monomer -m model_2 -i input/T1061.fasta and I read the log, confused, the T1061 is 949AA. ` I0822 07:33:00.806264 140553952322112 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpxbrk9wt6/output.sE 0.0001 -E 0.0001 --cpu 8 -N 1 input/T1061.fasta /dataset//uniref90/uniref90.fasta" I0822 07:33:01.157015 140553952322112 utils.py:36] Started Jackhmmer (uniref90.fasta) query I0822 07:37:27.058227 140553952322112 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 265.901 seconds I0822 07:37:27.072012 140553952322112 jackhmmer.py:133] Launching subprocess "/opt/conda/bin/jackhmmer -o /dev/null -A /tmp/tmpnn6am537/output.sE 0.0001 -E 0.0001 --cpu 8 -N 1 input/T1061.fasta /dataset//mgnify/mgy_clusters_2018_12.fa" I0822 07:37:27.439405 140553952322112 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query I0822 07:42:42.192071 140553952322112 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 314.752 seconds I0822 07:42:42.364272 140553952322112 hhsearch.py:85] Launching subprocess "/opt/conda/bin/hhsearch -i /tmp/tmpog4q4684/query.a3m -o /tmp/tmpog40/pdb70" I0822 07:42:42.712445 140553952322112 utils.py:36] Started HHsearch query I0822 07:44:18.199999 140553952322112 utils.py:40] Finished HHsearch query in 95.487 seconds I0822 07:44:18.555797 140553952322112 hhblits.py:128] Launching subprocess "/opt/conda/bin/hhblits -i input/T1061.fasta -cpu 4 -oa3m /tmp/tmpz9oq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /dataset//bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_optst30_2018_08" I0822 07:44:19.050110 140553952322112 utils.py:36] Started HHblits query

I0822 09:01:02.278290 140553952322112 utils.py:40] Finished HHblits query in 4603.228 seconds ` feature extraction spend time: 5305.185729026794 feature extraction Completed succesfully

I print the feature extraction time, find that , the 5305 is almost equals to the sum of each db search time, but according to the article, I think the feature extraction spend time should be almost equal to HHblits search, so can you explain the confusing problem?

opened by yanchenmochen 3
failed to alloc 2147483648 bytes on host: CUDA_ERROR_OPERATING_SYSTEM: OS call failed or operation not supported on this OS

When I use the code to compute T1050.fasta, which is composed of 700 residuses， the command line output the problem。 The Environment is GPU： A100， Ubuntu，but I use higher version jax and jaxlib， is it the problem causing this？

(parafold) [email protected]:~# pip list | grep jax jax 0.3.15 jaxlib 0.3.15+cuda11.cudnn82

opened by yanchenmochen 3
Too many command-line arguments

Hi,

First of all, thanks for developing this tool, I'm looking forward to playing with it!

I installed the ParallelFold into a Ubuntu 18 machine, and the full alphafold database into an external drive.

When running the command: $ ./run_alphafold.sh -d /media/qhr/"My Passport"/alphafold/AlphaFold_DB -o output -p monomer_ptm -i input/GA98.fasta -m model_1 -f

I get the Error: Too many command-line arguments.

Also get the same error by calling directly to run_alphafold.py: $ python3 run_alphafold.py --fasta_paths=input/GA98.fasta --model_preset=monomer --data_dir=/media/qhr/"My Passport"/alphafold/AlphaFold_DB --output_dir=output --uniref90_database_path=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/uniref90 --mgnify_database_path=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/mgnify --template_mmcif_dir=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/pdb_mmcif --obsolete_pdbs_path=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/pdb_mmcif/obsolete.dat --use_gpu_relax=True bfd_database_path=/media/qhr/"My Passport"/alphafold/AlphaFold_DB/bfd --max_template_date=2020-05-14

Is it possible that the space in the name of the external drive "My Passport" is causing such error?

Thanks! Ana

opened by AnaValero 1

Alphafold2 v/s Parafold timings

I have a fundamental doubt about the difference between Alphafold2 and Parafold running procedure, how to determine whether Parafold is doing Parallel task unlike sequential tasks performed by Alphafold2 for the first step involving Jackhmmer, Jackhmmer and HHblits searches.

Snippets of log files obtained from running Alphafold2 and Parafold

Alphafold2 log:

I0409 14:04:28.020900 139865793787712 run_alphafold.py:376] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
I0409 14:04:28.021180 139865793787712 run_alphafold.py:393] Using random seed 1420247507508611084 for the data pipeline
I0409 14:04:28.021463 139865793787712 run_alphafold.py:161] Predicting seq1
I0409 14:04:28.037414 139865793787712 jackhmmer.py:133] Launching subprocess "/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpm1u84thu/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1/fasta_files/seq1.fasta /alphafold_data//uniref90/uniref90.fasta"
I0409 14:04:28.111756 139865793787712 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0409 14:10:17.276236 139865793787712 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 349.164 seconds
I0409 14:10:17.462168 139865793787712 jackhmmer.py:133] Launching subprocess "/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmpub1qi595/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /fasta_files/seq1.fasta /alphafold_data//mgnify/mgy_clusters_2018_12.fa"
I0409 14:10:17.513182 139865793787712 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
I0409 14:16:32.112656 139865793787712 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 374.599 seconds
I0409 14:16:33.369129 139865793787712 hhsearch.py:85] Launching subprocess "/.conda/envs/alphafold/bin/hhsearch -i /tmp/tmpyot74k7r/query.a3m -o /tmp/tmpyot74k7r/output.hhr -maxseq 1000000 -d /alphafold_data//pdb70/pdb70"
I0409 14:16:33.466009 139865793787712 utils.py:36] Started HHsearch query
I0409 14:22:32.148045 139865793787712 utils.py:40] Finished HHsearch query in 358.682 seconds
I0409 14:22:32.838686 139865793787712 hhblits.py:128] Launching subprocess "/.conda/envs/alphafold/bin/hhblits -i /fasta_files/seq1.fasta -cpu 4 -oa3m /tmp/tmpedyoxta1/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /alphafold_data//bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /alphafold_data//uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0409 14:22:32.926801 139865793787712 utils.py:36] Started HHblits query
I0409 18:56:30.223437 139865793787712 utils.py:40] Finished HHblits query in 16437.296 seconds

Parafold log:

I0427 21:17:27.915049 140305630689088 run_alphafold.py:397] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
I0427 21:17:27.915312 140305630689088 run_alphafold.py:414] Using random seed 1534697036303804749 for the data pipeline
I0427 21:17:27.915629 140305630689088 run_alphafold.py:165] Predicting seq2
I0427 21:17:27.925500 140305630689088 jackhmmer.py:133] Launching subprocess "/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmp5fo28348/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /fasta_files/seq2.fasta /alphafold_data//uniref90/uniref90.fasta"
I0427 21:17:27.996705 140305630689088 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I0427 21:23:54.643056 140305630689088 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 386.646 seconds
I0427 21:23:54.829476 140305630689088 jackhmmer.py:133] Launching subprocess "/.conda/envs/alphafold/bin/jackhmmer -o /dev/null -A /tmp/tmprs3za6w_/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /fasta_files/seq2.fasta /alphafold_data//mgnify/mgy_clusters_2018_12.fa"
I0427 21:23:54.875119 140305630689088 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
I0427 21:31:38.409492 140305630689088 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 463.534 seconds
I0427 21:31:39.768360 140305630689088 hhsearch.py:85] Launching subprocess "/.conda/envs/alphafold/bin/hhsearch -i /tmp/tmpjgr58ebb/query.a3m -o /tmp/tmpjgr58ebb/output.hhr -maxseq 1000000 -d /alphafold_data//pdb70/pdb70"
I0427 21:31:39.850885 140305630689088 utils.py:36] Started HHsearch query
I0427 21:39:23.420352 140305630689088 utils.py:40] Finished HHsearch query in 463.569 seconds
I0427 21:39:24.173583 140305630689088 hhblits.py:128] Launching subprocess "/.conda/envs/alphafold/bin/hhblits -i /fasta_files/seq2.fasta -cpu 4 -oa3m /tmp/tmpmzl5arhr/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /alphafold_data//bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /alphafold_data//uniclust30/uniclust30_2018_08/uniclust30_2018_08"
I0427 21:39:24.259592 140305630689088 utils.py:36] Started HHblits query
I0428 01:34:31.302148 140305630689088 utils.py:40] Finished HHblits query in 14107.042 seconds

They look similar to me, and both use 8cpus, 8cpus, and 4cpus, respectively. Please clarify this for me.

Thank you Aditi

opened by adi1bioinfo 0

An error in feature generation

Hi, When I used your new version to make fearure.pkl, this error occurred, could you give any advice on how to solve it?

FATAL Flags parsing error: Unknown command line flag 'model_names'. Did you mean: model_preset ? Pass --helpshort or --helpfull to see help on flags.

opened by YiningWang2 1

Releases(v1.1)

v1.1(Mar 12, 2022)

ParaFold v1.1 (for AlphaFold v2.1.2)
Source code(tar.gz)
Source code(zip)
v1.0(Feb 4, 2022)

First release of ParaFold, this version matches with AlphaFold 2.1.1
Source code(tar.gz)
Source code(zip)

Owner

Bozitao Zhong

Protein Design

GitHub Repository

ReAct: Out-of-distribution Detection With Rectified Activations

ReAct: Out-of-distribution Detection With Rectified Activations This is the source code for paper ReAct: Out-of-distribution Detection With Rectified

38 Dec 05, 2022

Continuum Learning with GEM: Gradient Episodic Memory

Gradient Episodic Memory for Continual Learning Source code for the paper: @inproceedings{GradientEpisodicMemory, title={Gradient Episodic Memory

360 Dec 27, 2022

[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links

LinkBERT: A Knowledgeable Language Model Pretrained with Document Links This repo provides the model, code & data of our paper: LinkBERT: Pretraining

264 Jan 01, 2023

Learning Skeletal Articulations with Neural Blend Shapes

This repository provides an end-to-end library for automatic character rigging and blend shapes generation as well as a visualization tool. It is based on our work Learning Skeletal Articulations wit

504 Dec 30, 2022

This repository provides an efficient PyTorch-based library for training deep models.

s3sec Test AWS S3 buckets for read/write/delete access This tool was developed to quickly test a list of s3 buckets for public read, write and delete

123 Jan 05, 2023

Simple sinc interpolation in PyTorch.

Kazane: simple sinc interpolation for 1D signal in PyTorch Kazane utilize FFT based convolution to provide fast sinc interpolation for 1D signal when

10 May 03, 2022

(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain Mingchen Zhuge*, Dehong Gao*, Deng-Ping Fan#, Linbo Jin, Ben Chen, Haoming Zhou, Minghui

248 Dec 04, 2022

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

In-Place Activated BatchNorm In-Place Activated BatchNorm for Memory-Optimized Training of DNNs In-Place Activated BatchNorm (InPlace-ABN) is a novel

1.3k Dec 29, 2022

Employs neural networks to classify images into four categories: ship, automobile, dog or frog

Neural Net Image Classifier Employs neural networks to classify images into four categories: ship, automobile, dog or frog Viterbi_1.py uses a classic

1 Jan 18, 2022

Liecasadi - liecasadi implements Lie groups operation written in CasADi

liecasadi liecasadi implements Lie groups operation written in CasADi, mainly di

14 Nov 05, 2022

Realtime segmentation with ENet, the fast and accurate segmentation net.

Enet This is a realtime segmentation net with almost 22 fps on GTX1080 ti, and the model size is very small with only 28M. This repo contains the infe

14 Aug 30, 2022

LONG-TERM SERIES FORECASTING WITH QUERYSELECTOR – EFFICIENT MODEL OF SPARSEATTENTION

Query Selector Here you can find code and data loaders for the paper https://arxiv.org/pdf/2107.08687v1.pdf . Query Selector is a novel approach to sp

62 Dec 17, 2022

Implementation detail for paper "Multi-level colonoscopy malignant tissue detection with adversarial CAC-UNet"

Multi-level-colonoscopy-malignant-tissue-detection-with-adversarial-CAC-UNet Implementation detail for our paper "Multi-level colonoscopy malignant ti

[email protected]"> 84 Nov 22, 2022

The Turing Change Point Detection Benchmark: An Extensive Benchmark Evaluation of Change Point Detection Algorithms on real-world data

Turing Change Point Detection Benchmark Welcome to the repository for the Turing Change Point Detection Benchmark, a benchmark evaluation of change po

85 Dec 28, 2022