minimizer-space de Bruijn graphs (mdBG) for whole genome assembly

Overview

rust-mdbg: Minimizer-space de Bruijn graphs (mdBG) for whole-genome assembly

rust-mdbg is an ultra-fast minimizer-space de Bruijn graph (mdBG) implementation, geared towards the assembly of long and accurate reads such as PacBio HiFi.

Rationale

rust-mdbg performs mdBG construction of a 52x human genome HiFi data in around 10 minutes on 8 threads, with 10GB of maximum RAM usage.

rust-mdbg is fast because it operates in minimizer-space, meaning that the reads, the assembly graph, and the final assembly, are all represented as ordered lists of minimizers, instead of strings of nucleotides. A conversion step then yields a classical base-space representation.

Limitations

However, this high speed comes at a cost! :)

  • rust-mdbg gives good-quality results but still of lower contiguity and completeness than state-of-the-art assemblers such as HiCanu and hifiasm.
  • rust-mdbg performs best with at least 40x to 50x of coverage.
  • No polishing step is implemented; so, assemblies will have around the same accuracy as the reads.

Installation

Clone the repository (make sure you have a working Rust environment), and run

cargo build --release

For performing graph simplifications, gfatools is required.

Quick start

cargo build --release
target/release/rust-mdbg reads-0.00.fa.gz -k 7 --density 0.0008 -l 10 --minabund 2 --prefix example
utils/magic_simplify example

Multi-k assembly

For better contiguity, try the provided multi-k assembly script. It performs assembly iteratively, starting with k= 10, up to an automatically-determined largest k. This comes at the expense of ~7x longer running time.

utils/multik <reads.fq.gz> <some_output_prefix> <nb_threads>

Overview

rust-mdbg is a modular assembler. It consists of three components:

  1. rust-mdbg, to perform assembly in minimizer-space
  2. gfatools (external component), to perform graph simplifications
  3. to_basespace, to convert a minimizer-space assembly to base-space

For convenience, components 2 and 3 are wrapped into a script called magic_simplify.

Input

rust-mdbg takes a single FASTA/FASTQ input (gzip-compressed or not). Multi-line sequences, and sequences with lowercase characters, are not supported.

If you have seqtk installed, you can use

seqtk seq -A reads.unformatted.fq > reads.fa

to format reads accordingly.

Output data

The output of rust-mdbg consists of:

  • A .gfa file containing the minimizer-space de Bruijn graph, without sequences,
  • Several .sequences files containing the sequences of the nodes of the graph.

The executable to_basespace allows to combine both outputs and produce a .gfa file, with sequences.

Running an example

A sample set of reads is provided in the example/ folder. Run

target/release/rust-mdbg reads-0.00.fa.gz -k 7 --density 0.0008 -l 10 --minabund 2 --prefix example

which will create an example.gfa file.

In order to populate the .gfa file with base-space sequences and perform graph simplification, run

utils/magic_simplify example

which will create example.msimpl.gfa and example.msimpl.fa files.

Parameters

The main parameters of rust-mdbg are the k-min-mer value k, the minimizer length l, and the minimizer density d (delta in the paper). Another parameter is --presimp, set by default to 0.01, which performs a graph simplification: a neighbor node is deleted if its abundance is below 1% that of min(max(abundance of other neighbors), abundance of current node). For better results, and also without the need to set any parameter, try the multi-k strategy (see Multi-k assembly section). This section explains how parameters are set in single-k assembly.

All three parameters k, l, and d significantly impact the quality of results. One can think of them as a generalization of the k parameter in classical de Bruijn graphs. When you run rust-mdbg without specifying parameters, it sets them to:

d = 0.003

l = 12

k = 0.75 * average_readlen * d

These parameters will give reasonable, but far from optimal, draft assemblies. We experimentally found that the best results are often obtained with k values within 20-40, l within 10-14, and d within 0.001-0.005. Setting k and d such that the ratio k/d is slightly below the read length appears to be an effective strategy.

For further information on usage and parameters, run

target/release/rust-mdbg -h

for a one-line summary of each flag, or run

target/release/rust-mdbg --help

for a lengthy explanation of each flag.

Performance

Dataset Genome size (HPC) Coverage
Parameters
N50 Runtime Memory
D. melanogaster HiFi 98Mbp 100x auto
multi-k
k=35,l=12,d=0.002
2.5Mbp
2.5Mbp
6.0Mbp
2m15s
15m
1m9s
2.5GB
1.8GB
1.5GB
Strawberry HiFi 0.7Gbp 36x auto
multi-k
k=38,l=14,d=0.003
0.5Mbp
1Mbp
0.7Mbp
6m12s
40m
5m31s
12GB
11GB
10GB
H. sapiens (HG002) HiFi 2.2Gbp 52x auto
multi-k
k=21,l=14,d=0.003
1.0Mbp
16.9Mbp
13.9Mbp
27m30s
3h15m
10m23s
16.9GB
20GB
10.1GB

Runtime breakdown:
H. sapiens: 10m23s = 6m51s rust-mdbg + 1m48s gfatools + 1m44s to_basespace

The runs with custom parameters (from the paper) were made with commit b99d938, and unlike in the paper, we did not use robust minimizers which requires additional l-mer counting beforehand. For historical reasons, reads and assemblies were homopolymer-compressed in those experiments and the homopolymer-compressed genome size is reported. So beware that these numbers are not directly comparable to the output of other assemblers. In addition to the parameters shown in the table, the rust-mdbg command line also contained --bf --no-error-correct --threads 8.

Running rust-mdbg without graph simplifications

To convert an assembly to base-space without performing any graph simplifications, there are two ways:

  • with gfatools
gfatools asm -u  example.gfa > example.unitigs.gfa
target/release/to_basespace --gfa example.unitigs.gfa --sequences example.sequences
  • without gfatools (slower, but the code is more straightforward to understand)

utils/complete_gfa.py example.sequences example.gfa

In both cases, this will create an example.complete.gfa file that you can convert to FASTA with

bash utils/gfa2fasta.sh example.complete

License

rust-mdbg is freely available under the MIT License.

Developers

  • Barış Ekim, supervised by Bonnie Berger at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT)
  • Rayan Chikhi at the Department of Computational Biology at Institut Pasteur

Citation

Minimizer-space de Bruijn graphs (2021) BiorXiv

@article {mdbg,
	author = {Ekim, Bar{\i}{\c s} and Berger, Bonnie and Chikhi, Rayan},
	title = {Minimizer-space de Bruijn graphs},
	year = {2021},
	doi = {10.1101/2021.06.09.447586},
	publisher = {Cold Spring Harbor Laboratory},
	journal = {bioRxiv}
}

Contact

Should you have any inquiries, please contact Barış Ekim at baris [at] mit [dot] edu, or Rayan Chikhi at rchikhi [at] pasteur [dot] fr.

Comments
  • m1 arm support

    m1 arm support

    Hello rust-mdbg team,

    It seems that there is no support for ARM structure yet, I have the following error when compiling on ARM64:

    The following warnings were emitted during compilation:

    warning: cc: error: unrecognized command-line option '-msse4.2' warning: cc: error: unrecognized command-line option '-maes' warning: cc: error: unrecognized command-line option '-mavx' warning: cc: error: unrecognized command-line option '-mavx2'

    error: failed to run custom build command for fasthash-sys v0.3.2

    Caused by: process didn't exit successfully: /Users/jianshuzhao/Github/rust-mdbg/target/release/build/fasthash-sys-17495dcf061597dc/build-script-build (signal: 6, SIGABRT: process abort signal) --- stdout TARGET = Some("aarch64-apple-darwin") OPT_LEVEL = Some("3") TARGET = Some("aarch64-apple-darwin") HOST = Some("aarch64-apple-darwin") TARGET = Some("aarch64-apple-darwin") TARGET = Some("aarch64-apple-darwin") HOST = Some("aarch64-apple-darwin") CC_aarch64-apple-darwin = None CC_aarch64_apple_darwin = None HOST_CC = None CC = None HOST = Some("aarch64-apple-darwin") TARGET = Some("aarch64-apple-darwin") HOST = Some("aarch64-apple-darwin") CFLAGS_aarch64-apple-darwin = None CFLAGS_aarch64_apple_darwin = None HOST_CFLAGS = None CFLAGS = None DEBUG = Some("false") running: "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-Wno-implicit-fallthrough" "-Wno-unknown-attributes" "-msse4.2" "-maes" "-mavx" "-mavx2" "-DT1HA0_RUNTIME_SELECT=1" "-DT1HA0_AESNI_AVAILABLE=1" "-Wall" "-Wextra" "-o" "/Users/jianshuzhao/Github/rust-mdbg/target/release/build/fasthash-sys-d509c7de4ba60bc4/out/src/fasthash.o" "-c" "src/fasthash.cpp" cargo:warning=cc: error: unrecognized command-line option '-msse4.2' cargo:warning=cc: error: unrecognized command-line option '-maes' cargo:warning=cc: error: unrecognized command-line option '-mavx' cargo:warning=cc: error: unrecognized command-line option '-mavx2' exit status: 1

    --- stderr thread 'main' panicked at '

    Internal error occurred: Command "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-Wno-implicit-fallthrough" "-Wno-unknown-attributes" "-msse4.2" "-maes" "-mavx" "-mavx2" "-DT1HA0_RUNTIME_SELECT=1" "-DT1HA0_AESNI_AVAILABLE=1" "-Wall" "-Wextra" "-o" "/Users/jianshuzhao/Github/rust-mdbg/target/release/build/fasthash-sys-d509c7de4ba60bc4/out/src/fasthash.o" "-c" "src/fasthash.cpp" with args "cc" did not execute successfully (status code exit status: 1).

    ', /Users/jianshuzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/gcc-0.3.55/src/lib.rs:1672:5 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace fatal runtime error: failed to initiate panic, error 5 warning: build failed, waiting for other jobs to finish... error: build failed

    Any possibilities to provide support?

    Thanks,

    Jianshu

    opened by jianshu93 6
  • Recommended parameters for metagenome assembly and a related question

    Recommended parameters for metagenome assembly and a related question

    Hi,

    I want to try mdBG on real metagenome samples. I wonder if you could suggest a parameter combo to use (or combos to try out). And should I do the multi-k mode?

    For the real samples, I could crudely guess the number of species in the library, and perhaps an exaggerated total genome size from it as well. I'm not sure if these could be useful.

    Another question is: could mdBG output contig coverage estimates?

    Thank you!

    question 
    opened by xfengnefx 5
  • Unable to assemble the D.melanogaster genome from 24kb HiFi reads

    Unable to assemble the D.melanogaster genome from 24kb HiFi reads

    thread 'main' panicked at 'called Result::unwrap() on an Err value: Error { kind: BufferLimit }', src/main.rs:187:33

    Reads taken from: https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos2/sra-pub-run-17/SRR1023860/SRR10238607.1

    Command: utils/multik <reads> <output prefix> 56

    Happens both with and without homopolymer compression.

    bug 
    opened by sebschmi 5
  • multik executes run with k < l

    multik executes run with k < l

    When assembling E.coli with the multik script, it runs mdbg with k = 10 and l = 12, resulting in mdbg panicking with "Non-ACGTN nucleotide encountered!"

    The multik script then continues silently.

    output
    thread '<unnamed>' panicked at 'Non-ACGTN nucleotide encountered!', /home/sebschmi/.cargo/registry/src/github.com-1ecc6299db9ec823/nthash-0.5.0/src/lib.rs:43:9
    stack backtrace:
       0: std::panicking::begin_panic
       1: <nthash::NtHashIterator as core::iter::traits::iterator::Iterator>::next
       2: rust_mdbg::read::Read::extract
       3: rust_mdbg::main::{{closure}}
       4: rust_mdbg::main::{{closure}}
       5: <F as scoped_threadpool::FnBox>::call_box
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: SendError { .. }', /home/sebschmi/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:213:73
    stack backtrace:
       0: rust_begin_unwind
                 at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
       1: core::panicking::panic_fmt
                 at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
       2: core::result::unwrap_failed
                 at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/result.rs:1613:5
       3: scoped_threadpool::Pool::scoped
       4: core::ops::function::FnOnce::call_once{{vtable.shim}}
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: SendError { .. }', /home/sebschmi/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:219:72
    stack backtrace:
       0:     0x55e87265b2ec - std::backtrace_rs::backtrace::libunwind::trace::h09f7e4e089375279
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
       1:     0x55e87265b2ec - std::backtrace_rs::backtrace::trace_unsynchronized::h1ec96f1c7087094e
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
       2:     0x55e87265b2ec - std::sys_common::backtrace::_print_fmt::h317b71fc9a5cf964
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:67:5
       3:     0x55e87265b2ec - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::he3555b48e7dfe7f0
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:46:22
       4:     0x55e87267d4fc - core::fmt::write::h513b07ca38f4fb1b
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/fmt/mod.rs:1149:17
       5:     0x55e872657995 - std::io::Write::write_fmt::haf8c932b52111354
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/io/mod.rs:1697:15
       6:     0x55e87265cec0 - std::sys_common::backtrace::_print::h195c38364780a303
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:49:5
       7:     0x55e87265cec0 - std::sys_common::backtrace::print::hc09dfdea923b6730
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:36:9
       8:     0x55e87265cec0 - std::panicking::default_hook::{{closure}}::hb2e38ec0d91046a3
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:211:50
       9:     0x55e87265ca75 - std::panicking::default_hook::h60284635b0ad54a8
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:228:9
      10:     0x55e87265d574 - std::panicking::rust_panic_with_hook::ha677a669fb275654
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:606:17
      11:     0x55e87265d050 - std::panicking::begin_panic_handler::{{closure}}::h976246fb95d93c31
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:502:13
      12:     0x55e87265b794 - std::sys_common::backtrace::__rust_end_short_backtrace::h38077ee5b7b9f99a
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:139:18
      13:     0x55e87265cfb9 - rust_begin_unwind
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
      14:     0x55e872545651 - core::panicking::panic_fmt::h35f3a62252ba0fd2
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
      15:     0x55e872545743 - core::result::unwrap_failed::hb53671404b9e33c2
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/result.rs:1613:5
      16:     0x55e8725e8f9f - scoped_threadpool::Scope::join_all::hcb532061605ab1b0
      17:     0x55e87255ee33 - scoped_threadpool::Pool::scoped::hb64980f16173dad1
      18:     0x55e87255b128 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hf2fa39940289df70
      19:     0x55e87255ee5e - std::sys_common::backtrace::__rust_begin_short_backtrace::h6bd664fd6d7bb829
      20:     0x55e87259a883 - core::ops::function::FnOnce::call_once{{vtable.shim}}::ha5de8d6fee3bff3e
      21:     0x55e872660893 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hcbc6d2d80772be64
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/boxed.rs:1694:9
      22:     0x55e872660893 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h9bffa2ca65a1d6e6
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/boxed.rs:1694:9
      23:     0x55e872660893 - std::sys::unix::thread::Thread::new::thread_start::ha678a8b0caec8f55
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys/unix/thread.rs:106:17
      24:     0x7f16121a96db - start_thread
                                   at /build/glibc-S9d2JN/glibc-2.27/nptl/pthread_create.c:463
      25:     0x7f161193071f - __GI___clone
                                   at /build/glibc-S9d2JN/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      26:                0x0 - <unknown>
    thread panicked while panicking. aborting.
    Command terminated by signal 4
    625.22user 29.59system 1:35.45elapsed 685%CPU (0avgtext+0avgdata 579660maxresident)k
    
    bug 
    opened by sebschmi 5
  • Missing assembly-final.msimpl.fa in multik mode

    Missing assembly-final.msimpl.fa in multik mode

    Hello,

    Thank you for this tool.

    I ran mdbg with the following command line: multik reads.fastq.gz assembly 56 10 1000

    I get the files assembly-k*.gfa, assembly-k*.msimpl.gfa and assembly-k*.msimpl.fa with k from 10 to 1000, but I do not get the final output assembly-final.msimpl.fa.

    bug 
    opened by nadegeguiglielmoni 5
  • example.sequences file

    example.sequences file

    Sorry if I'm being slow but when I create the gfa file multiple .sequences files are created but when then in the readme to_basespace takes only a single example.sequences file. Where does this come from? Do you combine the .sequences files in some way or..?

    Thanks!

    question 
    opened by samlipworth 4
  • KSizeOutOfRange errors during rust-mdbg run

    KSizeOutOfRange errors during rust-mdbg run

    Hi there,

    Trying out multik with some Nanopore metagenomics reads (seqtk-formatted) and I'm currently getting the errors below as it iteratively goes through the different -k values. Any ideas on what might be going wrong and how I might fix it?

    So far, the run hasn't fully aborted and I'm letting it run until I get some output - will let y

    $ ../rust-mdbg/utils/multik sup.fastq.gz std_sipp 10
    avg readlen: 6147875, max k: 17521
    assembly with k=10
        Finished release [optimized] target(s) in 0.09s
         Running `sup.fastq.gz -k 10 -l 12 --density 0.003 --minabund 2 --threads 10 --prefix std_sipp-k10 --bf`
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 4 }', src/read.rs:148:63
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 1 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 11 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 4 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 1 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 5 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 3 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 1 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 10 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 10 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', /home/andre/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:213:73
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', /home/andre/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:219:72
    stack backtrace:
       0:     0x5591d2438050 - std::backtrace_rs::backtrace::libunwind::trace::h63b7a90188ab5fb3
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5
       1:     0x5591d2438050 - std::backtrace_rs::backtrace::trace_unsynchronized::h80aefbf9b851eca7
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
       2:     0x5591d2438050 - std::sys_common::backtrace::_print_fmt::hbef05ae4237a4d72
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:67:5
       3:     0x5591d2438050 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h28abce2fdb9884c2
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:46:22
       4:     0x5591d245670f - core::fmt::write::h3b84512577ca38a8
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/fmt/mod.rs:1092:17
       5:     0x5591d24352b2 - std::io::Write::write_fmt::h465f8feea02e2aa1
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/io/mod.rs:1572:15
       6:     0x5591d243a185 - std::sys_common::backtrace::_print::h525280ee0d29bdde
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:49:5
       7:     0x5591d243a185 - std::sys_common::backtrace::print::h1f0f5b9f3ef8fb78
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:36:9
       8:     0x5591d243a185 - std::panicking::default_hook::{{closure}}::ha5838f6faa4a5a8f
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:208:50
       9:     0x5591d2439c33 - std::panicking::default_hook::hfb9fe98acb0dcb3b
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:225:9
      10:     0x5591d243a78d - std::panicking::rust_panic_with_hook::hb89f5f19036e6af8
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:591:17
      11:     0x5591d243a327 - std::panicking::begin_panic_handler::{{closure}}::h119e7951427f41da
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:497:13
      12:     0x5591d243850c - std::sys_common::backtrace::__rust_end_short_backtrace::hce386c44bf47a128
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:141:18
      13:     0x5591d243a289 - rust_begin_unwind
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:493:5
      14:     0x5591d2323341 - core::panicking::panic_fmt::h2242888e8769cd33
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/panicking.rs:92:14
      15:     0x5591d2323233 - core::option::expect_none_failed::hb1edf11f73e63728
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/option.rs:1329:5
      16:     0x5591d23c108f - scoped_threadpool::Scope::join_all::hd6132fc8a04c2f8d
      17:     0x5591d233fcbb - core::ops::function::FnOnce::call_once{{vtable.shim}}::h198262ef865dc7ad
      18:     0x5591d2391412 - std::sys_common::backtrace::__rust_begin_short_backtrace::he7799c2fe1d42088
      19:     0x5591d234f443 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hc12e7712db099355
      20:     0x5591d243d28a - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hc444a77f8dd8d825
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546:9
      21:     0x5591d243d28a - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h8b68a0a9a2093dfc
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546:9
      22:     0x5591d243d28a - std::sys::unix::thread::Thread::new::thread_start::hb95464447f61f48d
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys/unix/thread.rs:71:17
      23:     0x7f00c05ae6ba - start_thread
      24:     0x7f00bfdd751d - clone
      25:                0x0 - <unknown>
    thread panicked while panicking. aborting.
    Command terminated by signal 4
    537.76user 35.81system 3:28.70elapsed 274%CPU (0avgtext+0avgdata 855256maxresident)k
    0inputs+1781280outputs (0major+10608356minor)pagefaults 0swaps
    + /usr/bin/time /home/andre/gfatools/gfatools asm std_sipp-k10.gfa -t 10,50000 -t 10,50000 -b 100000 -b 100000 -t 10,50000 -b 100000 -b 100000 -b 100000 -t 10,50000 -b 100000 -t 10,50000 -b 1000000 -t 10,150000 -b 1000000 -u
    ERROR: failed to read the graph
    Command exited with non-zero status 2
    0.00user 0.00system 0:00.00elapsed ?%CPU (0avgtext+0avgdata 1572maxresident)k
    0inputs+0outputs (0major+68minor)pagefaults 0swaps
    + python /home/andre/rust-mdbg/utils/gfa_break_loops.py std_sipp-k10.tmp1.gfa
    + [[ ! std_sipp-k10 == *--old-behavior* ]]
    + cargo run --manifest-path /home/andre/rust-mdbg/utils/../Cargo.toml --release --bin to_basespace -- --gfa std_sipp-k10.tmp2.gfa --sequences std_sipp-k10
        Finished release [optimized] target(s) in 0.07s
         Running `/home/andre/rust-mdbg/target/release/to_basespace --gfa std_sipp-k10.tmp2.gfa --sequences std_sipp-k10`
    + mv std_sipp-k10.tmp2.gfa.complete.gfa std_sipp-k10.tmp2.gfa
    + /usr/bin/time /home/andre/gfatools/gfatools asm std_sipp-k10.tmp2.gfa -t 10,50000 -b 100000 -t 10,100000 -b 1000000 -t 10,150000 -b 1000000 -u
    [M::main] Version: 0.4-r214-dirty
    [M::main] CMD: /home/andre/gfatools/gfatools asm -t 10,50000 -b 100000 -t 10,100000 -b 1000000 -t 10,150000 -b 1000000 -u std_sipp-k10.tmp2.gfa
    [M::main] Real time: 0.000 sec; CPU: 0.000 sec
    0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 1752maxresident)k
    0inputs+0outputs (0major+74minor)pagefaults 0swaps
    ++ stat -c%s std_sipp-k10.tmp2.gfa
    + filesize=9
    + ((  filesize > 100000000 ))
    + mv std_sipp-k10.tmp3.gfa std_sipp-k10.msimpl.gfa
    + [[ std_sipp-k10 != *\-\-\k\e\e\p* ]]
    + rm -rf std_sipp-k10.tmp1.gfa std_sipp-k10.tmp2.gfa
    + bash /home/andre/rust-mdbg/utils/gfa2fasta.sh std_sipp-k10.msimpl
    2.19user 0.28system 0:02.50elapsed 99%CPU (0avgtext+0avgdata 24008maxresident)k
    0inputs+8outputs (0major+9763minor)pagefaults 0swaps
    
    bug 
    opened by GeoMicroSoares 4
  • Nanopore metagenome assembly parameters

    Nanopore metagenome assembly parameters

    Hi there,

    Congratulations, this tool seems amazing and I can't wait to use it with my data! Are there specific parameters that I can use/optimize with rust-mdbg to assemble Nanopore metagenomes?

    Thanks.

    enhancement 
    opened by GeoMicroSoares 4
  • Problems in ruinning `rust-mdbg` without graph simplifications

    Problems in ruinning `rust-mdbg` without graph simplifications

    Hi,

    I've installed rust-mdbg

    git clone --recursive https://github.com/ekimb/rust-mdbg.git
    cd rust-mdbg
    cargo build --release
    

    I've run it

    ~/git/rust-mdbg/target/release/rust-mdbg ~/git/rust-mdbg/example/reads-0.00.fa.gz -k 7 --threads 1 --density 0.0008 -l 10 --minabund 2 --prefix example
    ls example*
    
    example.140646999713344.sequences  example.gfa
    

    and finally tried both approaches to go in base-space

    gfatools asm -u  example.gfa > example.unitigs.gfa
    ~/git/rust-mdbg/target/release/to_basespace --gfa example.unitigs.gfa --sequences example.sequences
    
    [M::main] Version: 0.5-r250-dirty
    [M::main] CMD: gfatools asm -u example.gfa
    [M::main] Real time: 0.001 sec; CPU: 0.003 sec
    Done parsing unitigs GFA, got 1 unitigs.
    Done parsing original GFA, with 0 k-min-mers.
    Done parsing .sequences file, recorded 0 sequences.
    thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/to_basespace.rs:258:55
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    

    and

    python3 ~/git/rust-mdbg/utils/complete_gfa.py example.*.sequences example.gfa
    
    Traceback (most recent call last):
      File "/home/guarracino/git/rust-mdbg/utils/complete_gfa.py", line 32, in <module>
        source_minims = node_minims[spl[1]]
    KeyError: '7'
    

    Am I doing silly errors somewhere?

    opened by AndreaGuarracino 3
  • Differences between uncompressed & compressed fastq files

    Differences between uncompressed & compressed fastq files

    This seems like a bug, but maybe I'm just misunderstanding something with how mdbg works.

    I discovered this after trying to run several human assemblies of varying input coverage (20x,30x,40x,50x) starting from hifi_reads.fq.gz files.

    The contiguity (n50) of all of the assemblies was in the same ballpark as the read n50 and there appeared to be no benefit to increased coverage. This coupled with the poor results in general had me scratching my head so I tried a different test dataset that was an uncompressed hifi_reads.fq and I got a great assembly.

    Curiosity piqued, I went back an unzipped the 20x coverage point I had tried earlier and got a much better assembly.

    See attached logs for logs from both the 20x assemblies starting from both hifi_reads.fq and hifi_reads.fq.gz

    Is this an actual bug, or is it just user error?

    hifi_reads_gzipped.log hifi_reads.log

    bug 
    opened by gconcepcion 3
  • magic_simplify crashes while running in Docker container on HPC cluster

    magic_simplify crashes while running in Docker container on HPC cluster

    Hey! I'm currently trying to run rust-mdbg as part of a fungi genome assembly pipeline using nextflow and docker containers on an HPC cluster and I'm running into these issues where the magic_simplify script crashes with os error 30 : read-only file system. I already checked the docker container and made sure the rust-mdbg dir is not read-only so I'm not sure what exactly is happening here. Maybe someone knows whats up? I'm using singularity to run the docker containers on the HPC cluster I just hope this is not some compatibility issue with singularity/rust..

    command.log

    bug 
    opened by fischer-hub 3
Releases(v1.0.1)
Owner
Barış Ekim
PhD student in Berger Group at @mit.
Barış Ekim
This repository stores the code to reproduce the results published in "TiWS-iForest: Isolation Forest in Weakly Supervised and Tiny ML scenarios"

TinyWeaklyIsolationForest This repository stores the code to reproduce the results published in "TiWS-iForest: Isolation Forest in Weakly Supervised a

2 Mar 21, 2022
VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

VSR-Transformer By Jiezhang Cao, Yawei Li, Kai Zhang, Luc Van Gool This paper proposes a new Transformer for video super-resolution (called VSR-Transf

Jiezhang Cao 225 Nov 13, 2022
Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt. This is done by

Mehdi Cherti 135 Dec 30, 2022
[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

CLNER The code is for our ACL-IJCNLP 2021 paper: Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning CLNER is a

71 Dec 08, 2022
Learning Lightweight Low-Light Enhancement Network using Pseudo Well-Exposed Images

Learning Lightweight Low-Light Enhancement Network using Pseudo Well-Exposed Images This repository contains the implementation of the following paper

Seonggwan Ko 9 Jul 30, 2022
[CVPR2021] DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets

DoDNet This repo holds the pytorch implementation of DoDNet: DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datase

116 Dec 12, 2022
Finetuner allows one to tune the weights of any deep neural network for better embeddings on search tasks

Finetuner allows one to tune the weights of any deep neural network for better embeddings on search tasks

Jina AI 794 Dec 31, 2022
[ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin

CPT: Efficient Deep Neural Network Training via Cyclic Precision Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vikas Chandra, Yingyan Lin Accep

26 Oct 25, 2022
Generate images from texts. In Russian. In PaddlePaddle

ruDALL-E PaddlePaddle ruDALL-E in PaddlePaddle. Install: pip install rudalle_paddle==0.0.1rc1 Run with free v100 on AI Studio. Original Pytorch versi

AgentMaker 20 Oct 18, 2022
🐦 Quickly annotate data from the comfort of your Jupyter notebook

🐦 pigeon - Quickly annotate data on Jupyter Pigeon is a simple widget that lets you quickly annotate a dataset of unlabeled examples from the comfort

Anastasis Germanidis 647 Jan 05, 2023
UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Pre-trained (foundation) models across tasks (understanding, generation and translation), languages (100+ languages), and modalities (language, image, audio, vision + language, audio + language, etc.

Microsoft 7.6k Jan 01, 2023
NeuPy is a Tensorflow based python library for prototyping and building neural networks

NeuPy v0.8.2 NeuPy is a python library for prototyping and building neural networks. NeuPy uses Tensorflow as a computational backend for deep learnin

Yurii Shevchuk 729 Jan 03, 2023
Code for NAACL 2021 full paper "Efficient Attentions for Long Document Summarization"

LongDocSum Code for NAACL 2021 paper "Efficient Attentions for Long Document Summarization" This repository contains data and models needed to reprodu

56 Jan 02, 2023
Unofficial pytorch implementation of the paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution"

DFSA Unofficial pytorch implementation of the ICCV 2021 paper "Dynamic High-Pass Filtering and Multi-Spectral Attention for Image Super-Resolution" (p

2 Nov 15, 2021
LEAP: Learning Articulated Occupancy of People

LEAP: Learning Articulated Occupancy of People Paper | Video | Project Page This is the official implementation of the CVPR 2021 submission LEAP: Lear

Neural Bodies 60 Nov 18, 2022
这是一个unet-pytorch的源码,可以训练自己的模型

Unet:U-Net: Convolutional Networks for Biomedical Image Segmentation目标检测模型在Pytorch当中的实现 目录 性能情况 Performance 所需环境 Environment 注意事项 Attention 文件下载 Downl

Bubbliiiing 567 Jan 05, 2023
Betafold - AlphaFold with tunings

BetaFold We (hegelab.org) craeted this standalone AlphaFold (AlphaFold-Multimer,

2 Aug 11, 2022
The Environment I built to study Reinforcement Learning + Pokemon Showdown

pokemon-showdown-rl-environment The Environment I built to study Reinforcement Learning + Pokemon Showdown Been a while since I ran this. Think it is

3 Jan 16, 2022
A naive ROS interface for visualDet3D.

YOLO3D ROS Node This repo contains a Monocular 3D detection Ros node. Base on https://github.com/Owen-Liuyuxuan/visualDet3D All parameters are exposed

Yuxuan Liu 19 Oct 08, 2022
Generates all variables from your .tf files into a variables.tf file.

tfvg Generates all variables from your .tf files into a variables.tf file. It searches for every var.variable_name in your .tf files and generates a v

1 Dec 01, 2022