minimizer-space de Bruijn graphs (mdBG) for whole genome assembly

Overview

rust-mdbg: Minimizer-space de Bruijn graphs (mdBG) for whole-genome assembly

rust-mdbg is an ultra-fast minimizer-space de Bruijn graph (mdBG) implementation, geared towards the assembly of long and accurate reads such as PacBio HiFi.

Rationale

rust-mdbg performs mdBG construction of a 52x human genome HiFi data in around 10 minutes on 8 threads, with 10GB of maximum RAM usage.

rust-mdbg is fast because it operates in minimizer-space, meaning that the reads, the assembly graph, and the final assembly, are all represented as ordered lists of minimizers, instead of strings of nucleotides. A conversion step then yields a classical base-space representation.

Limitations

However, this high speed comes at a cost! :)

  • rust-mdbg gives good-quality results but still of lower contiguity and completeness than state-of-the-art assemblers such as HiCanu and hifiasm.
  • rust-mdbg performs best with at least 40x to 50x of coverage.
  • No polishing step is implemented; so, assemblies will have around the same accuracy as the reads.

Installation

Clone the repository (make sure you have a working Rust environment), and run

cargo build --release

For performing graph simplifications, gfatools is required.

Quick start

cargo build --release
target/release/rust-mdbg reads-0.00.fa.gz -k 7 --density 0.0008 -l 10 --minabund 2 --prefix example
utils/magic_simplify example

Multi-k assembly

For better contiguity, try the provided multi-k assembly script. It performs assembly iteratively, starting with k= 10, up to an automatically-determined largest k. This comes at the expense of ~7x longer running time.

utils/multik <reads.fq.gz> <some_output_prefix> <nb_threads>

Overview

rust-mdbg is a modular assembler. It consists of three components:

  1. rust-mdbg, to perform assembly in minimizer-space
  2. gfatools (external component), to perform graph simplifications
  3. to_basespace, to convert a minimizer-space assembly to base-space

For convenience, components 2 and 3 are wrapped into a script called magic_simplify.

Input

rust-mdbg takes a single FASTA/FASTQ input (gzip-compressed or not). Multi-line sequences, and sequences with lowercase characters, are not supported.

If you have seqtk installed, you can use

seqtk seq -A reads.unformatted.fq > reads.fa

to format reads accordingly.

Output data

The output of rust-mdbg consists of:

  • A .gfa file containing the minimizer-space de Bruijn graph, without sequences,
  • Several .sequences files containing the sequences of the nodes of the graph.

The executable to_basespace allows to combine both outputs and produce a .gfa file, with sequences.

Running an example

A sample set of reads is provided in the example/ folder. Run

target/release/rust-mdbg reads-0.00.fa.gz -k 7 --density 0.0008 -l 10 --minabund 2 --prefix example

which will create an example.gfa file.

In order to populate the .gfa file with base-space sequences and perform graph simplification, run

utils/magic_simplify example

which will create example.msimpl.gfa and example.msimpl.fa files.

Parameters

The main parameters of rust-mdbg are the k-min-mer value k, the minimizer length l, and the minimizer density d (delta in the paper). Another parameter is --presimp, set by default to 0.01, which performs a graph simplification: a neighbor node is deleted if its abundance is below 1% that of min(max(abundance of other neighbors), abundance of current node). For better results, and also without the need to set any parameter, try the multi-k strategy (see Multi-k assembly section). This section explains how parameters are set in single-k assembly.

All three parameters k, l, and d significantly impact the quality of results. One can think of them as a generalization of the k parameter in classical de Bruijn graphs. When you run rust-mdbg without specifying parameters, it sets them to:

d = 0.003

l = 12

k = 0.75 * average_readlen * d

These parameters will give reasonable, but far from optimal, draft assemblies. We experimentally found that the best results are often obtained with k values within 20-40, l within 10-14, and d within 0.001-0.005. Setting k and d such that the ratio k/d is slightly below the read length appears to be an effective strategy.

For further information on usage and parameters, run

target/release/rust-mdbg -h

for a one-line summary of each flag, or run

target/release/rust-mdbg --help

for a lengthy explanation of each flag.

Performance

Dataset Genome size (HPC) Coverage
Parameters
N50 Runtime Memory
D. melanogaster HiFi 98Mbp 100x auto
multi-k
k=35,l=12,d=0.002
2.5Mbp
2.5Mbp
6.0Mbp
2m15s
15m
1m9s
2.5GB
1.8GB
1.5GB
Strawberry HiFi 0.7Gbp 36x auto
multi-k
k=38,l=14,d=0.003
0.5Mbp
1Mbp
0.7Mbp
6m12s
40m
5m31s
12GB
11GB
10GB
H. sapiens (HG002) HiFi 2.2Gbp 52x auto
multi-k
k=21,l=14,d=0.003
1.0Mbp
16.9Mbp
13.9Mbp
27m30s
3h15m
10m23s
16.9GB
20GB
10.1GB

Runtime breakdown:
H. sapiens: 10m23s = 6m51s rust-mdbg + 1m48s gfatools + 1m44s to_basespace

The runs with custom parameters (from the paper) were made with commit b99d938, and unlike in the paper, we did not use robust minimizers which requires additional l-mer counting beforehand. For historical reasons, reads and assemblies were homopolymer-compressed in those experiments and the homopolymer-compressed genome size is reported. So beware that these numbers are not directly comparable to the output of other assemblers. In addition to the parameters shown in the table, the rust-mdbg command line also contained --bf --no-error-correct --threads 8.

Running rust-mdbg without graph simplifications

To convert an assembly to base-space without performing any graph simplifications, there are two ways:

  • with gfatools
gfatools asm -u  example.gfa > example.unitigs.gfa
target/release/to_basespace --gfa example.unitigs.gfa --sequences example.sequences
  • without gfatools (slower, but the code is more straightforward to understand)

utils/complete_gfa.py example.sequences example.gfa

In both cases, this will create an example.complete.gfa file that you can convert to FASTA with

bash utils/gfa2fasta.sh example.complete

License

rust-mdbg is freely available under the MIT License.

Developers

  • Barış Ekim, supervised by Bonnie Berger at the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT)
  • Rayan Chikhi at the Department of Computational Biology at Institut Pasteur

Citation

Minimizer-space de Bruijn graphs (2021) BiorXiv

@article {mdbg,
	author = {Ekim, Bar{\i}{\c s} and Berger, Bonnie and Chikhi, Rayan},
	title = {Minimizer-space de Bruijn graphs},
	year = {2021},
	doi = {10.1101/2021.06.09.447586},
	publisher = {Cold Spring Harbor Laboratory},
	journal = {bioRxiv}
}

Contact

Should you have any inquiries, please contact Barış Ekim at baris [at] mit [dot] edu, or Rayan Chikhi at rchikhi [at] pasteur [dot] fr.

Comments
  • m1 arm support

    m1 arm support

    Hello rust-mdbg team,

    It seems that there is no support for ARM structure yet, I have the following error when compiling on ARM64:

    The following warnings were emitted during compilation:

    warning: cc: error: unrecognized command-line option '-msse4.2' warning: cc: error: unrecognized command-line option '-maes' warning: cc: error: unrecognized command-line option '-mavx' warning: cc: error: unrecognized command-line option '-mavx2'

    error: failed to run custom build command for fasthash-sys v0.3.2

    Caused by: process didn't exit successfully: /Users/jianshuzhao/Github/rust-mdbg/target/release/build/fasthash-sys-17495dcf061597dc/build-script-build (signal: 6, SIGABRT: process abort signal) --- stdout TARGET = Some("aarch64-apple-darwin") OPT_LEVEL = Some("3") TARGET = Some("aarch64-apple-darwin") HOST = Some("aarch64-apple-darwin") TARGET = Some("aarch64-apple-darwin") TARGET = Some("aarch64-apple-darwin") HOST = Some("aarch64-apple-darwin") CC_aarch64-apple-darwin = None CC_aarch64_apple_darwin = None HOST_CC = None CC = None HOST = Some("aarch64-apple-darwin") TARGET = Some("aarch64-apple-darwin") HOST = Some("aarch64-apple-darwin") CFLAGS_aarch64-apple-darwin = None CFLAGS_aarch64_apple_darwin = None HOST_CFLAGS = None CFLAGS = None DEBUG = Some("false") running: "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-Wno-implicit-fallthrough" "-Wno-unknown-attributes" "-msse4.2" "-maes" "-mavx" "-mavx2" "-DT1HA0_RUNTIME_SELECT=1" "-DT1HA0_AESNI_AVAILABLE=1" "-Wall" "-Wextra" "-o" "/Users/jianshuzhao/Github/rust-mdbg/target/release/build/fasthash-sys-d509c7de4ba60bc4/out/src/fasthash.o" "-c" "src/fasthash.cpp" cargo:warning=cc: error: unrecognized command-line option '-msse4.2' cargo:warning=cc: error: unrecognized command-line option '-maes' cargo:warning=cc: error: unrecognized command-line option '-mavx' cargo:warning=cc: error: unrecognized command-line option '-mavx2' exit status: 1

    --- stderr thread 'main' panicked at '

    Internal error occurred: Command "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-Wno-implicit-fallthrough" "-Wno-unknown-attributes" "-msse4.2" "-maes" "-mavx" "-mavx2" "-DT1HA0_RUNTIME_SELECT=1" "-DT1HA0_AESNI_AVAILABLE=1" "-Wall" "-Wextra" "-o" "/Users/jianshuzhao/Github/rust-mdbg/target/release/build/fasthash-sys-d509c7de4ba60bc4/out/src/fasthash.o" "-c" "src/fasthash.cpp" with args "cc" did not execute successfully (status code exit status: 1).

    ', /Users/jianshuzhao/.cargo/registry/src/github.com-1ecc6299db9ec823/gcc-0.3.55/src/lib.rs:1672:5 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace fatal runtime error: failed to initiate panic, error 5 warning: build failed, waiting for other jobs to finish... error: build failed

    Any possibilities to provide support?

    Thanks,

    Jianshu

    opened by jianshu93 6
  • Recommended parameters for metagenome assembly and a related question

    Recommended parameters for metagenome assembly and a related question

    Hi,

    I want to try mdBG on real metagenome samples. I wonder if you could suggest a parameter combo to use (or combos to try out). And should I do the multi-k mode?

    For the real samples, I could crudely guess the number of species in the library, and perhaps an exaggerated total genome size from it as well. I'm not sure if these could be useful.

    Another question is: could mdBG output contig coverage estimates?

    Thank you!

    question 
    opened by xfengnefx 5
  • Unable to assemble the D.melanogaster genome from 24kb HiFi reads

    Unable to assemble the D.melanogaster genome from 24kb HiFi reads

    thread 'main' panicked at 'called Result::unwrap() on an Err value: Error { kind: BufferLimit }', src/main.rs:187:33

    Reads taken from: https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos2/sra-pub-run-17/SRR1023860/SRR10238607.1

    Command: utils/multik <reads> <output prefix> 56

    Happens both with and without homopolymer compression.

    bug 
    opened by sebschmi 5
  • multik executes run with k < l

    multik executes run with k < l

    When assembling E.coli with the multik script, it runs mdbg with k = 10 and l = 12, resulting in mdbg panicking with "Non-ACGTN nucleotide encountered!"

    The multik script then continues silently.

    output
    thread '<unnamed>' panicked at 'Non-ACGTN nucleotide encountered!', /home/sebschmi/.cargo/registry/src/github.com-1ecc6299db9ec823/nthash-0.5.0/src/lib.rs:43:9
    stack backtrace:
       0: std::panicking::begin_panic
       1: <nthash::NtHashIterator as core::iter::traits::iterator::Iterator>::next
       2: rust_mdbg::read::Read::extract
       3: rust_mdbg::main::{{closure}}
       4: rust_mdbg::main::{{closure}}
       5: <F as scoped_threadpool::FnBox>::call_box
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: SendError { .. }', /home/sebschmi/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:213:73
    stack backtrace:
       0: rust_begin_unwind
                 at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
       1: core::panicking::panic_fmt
                 at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
       2: core::result::unwrap_failed
                 at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/result.rs:1613:5
       3: scoped_threadpool::Pool::scoped
       4: core::ops::function::FnOnce::call_once{{vtable.shim}}
    note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: SendError { .. }', /home/sebschmi/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:219:72
    stack backtrace:
       0:     0x55e87265b2ec - std::backtrace_rs::backtrace::libunwind::trace::h09f7e4e089375279
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
       1:     0x55e87265b2ec - std::backtrace_rs::backtrace::trace_unsynchronized::h1ec96f1c7087094e
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
       2:     0x55e87265b2ec - std::sys_common::backtrace::_print_fmt::h317b71fc9a5cf964
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:67:5
       3:     0x55e87265b2ec - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::he3555b48e7dfe7f0
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:46:22
       4:     0x55e87267d4fc - core::fmt::write::h513b07ca38f4fb1b
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/fmt/mod.rs:1149:17
       5:     0x55e872657995 - std::io::Write::write_fmt::haf8c932b52111354
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/io/mod.rs:1697:15
       6:     0x55e87265cec0 - std::sys_common::backtrace::_print::h195c38364780a303
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:49:5
       7:     0x55e87265cec0 - std::sys_common::backtrace::print::hc09dfdea923b6730
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:36:9
       8:     0x55e87265cec0 - std::panicking::default_hook::{{closure}}::hb2e38ec0d91046a3
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:211:50
       9:     0x55e87265ca75 - std::panicking::default_hook::h60284635b0ad54a8
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:228:9
      10:     0x55e87265d574 - std::panicking::rust_panic_with_hook::ha677a669fb275654
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:606:17
      11:     0x55e87265d050 - std::panicking::begin_panic_handler::{{closure}}::h976246fb95d93c31
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:502:13
      12:     0x55e87265b794 - std::sys_common::backtrace::__rust_end_short_backtrace::h38077ee5b7b9f99a
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:139:18
      13:     0x55e87265cfb9 - rust_begin_unwind
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
      14:     0x55e872545651 - core::panicking::panic_fmt::h35f3a62252ba0fd2
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
      15:     0x55e872545743 - core::result::unwrap_failed::hb53671404b9e33c2
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/result.rs:1613:5
      16:     0x55e8725e8f9f - scoped_threadpool::Scope::join_all::hcb532061605ab1b0
      17:     0x55e87255ee33 - scoped_threadpool::Pool::scoped::hb64980f16173dad1
      18:     0x55e87255b128 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hf2fa39940289df70
      19:     0x55e87255ee5e - std::sys_common::backtrace::__rust_begin_short_backtrace::h6bd664fd6d7bb829
      20:     0x55e87259a883 - core::ops::function::FnOnce::call_once{{vtable.shim}}::ha5de8d6fee3bff3e
      21:     0x55e872660893 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hcbc6d2d80772be64
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/boxed.rs:1694:9
      22:     0x55e872660893 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h9bffa2ca65a1d6e6
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/boxed.rs:1694:9
      23:     0x55e872660893 - std::sys::unix::thread::Thread::new::thread_start::ha678a8b0caec8f55
                                   at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys/unix/thread.rs:106:17
      24:     0x7f16121a96db - start_thread
                                   at /build/glibc-S9d2JN/glibc-2.27/nptl/pthread_create.c:463
      25:     0x7f161193071f - __GI___clone
                                   at /build/glibc-S9d2JN/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      26:                0x0 - <unknown>
    thread panicked while panicking. aborting.
    Command terminated by signal 4
    625.22user 29.59system 1:35.45elapsed 685%CPU (0avgtext+0avgdata 579660maxresident)k
    
    bug 
    opened by sebschmi 5
  • Missing assembly-final.msimpl.fa in multik mode

    Missing assembly-final.msimpl.fa in multik mode

    Hello,

    Thank you for this tool.

    I ran mdbg with the following command line: multik reads.fastq.gz assembly 56 10 1000

    I get the files assembly-k*.gfa, assembly-k*.msimpl.gfa and assembly-k*.msimpl.fa with k from 10 to 1000, but I do not get the final output assembly-final.msimpl.fa.

    bug 
    opened by nadegeguiglielmoni 5
  • example.sequences file

    example.sequences file

    Sorry if I'm being slow but when I create the gfa file multiple .sequences files are created but when then in the readme to_basespace takes only a single example.sequences file. Where does this come from? Do you combine the .sequences files in some way or..?

    Thanks!

    question 
    opened by samlipworth 4
  • KSizeOutOfRange errors during rust-mdbg run

    KSizeOutOfRange errors during rust-mdbg run

    Hi there,

    Trying out multik with some Nanopore metagenomics reads (seqtk-formatted) and I'm currently getting the errors below as it iteratively goes through the different -k values. Any ideas on what might be going wrong and how I might fix it?

    So far, the run hasn't fully aborted and I'm letting it run until I get some output - will let y

    $ ../rust-mdbg/utils/multik sup.fastq.gz std_sipp 10
    avg readlen: 6147875, max k: 17521
    assembly with k=10
        Finished release [optimized] target(s) in 0.09s
         Running `sup.fastq.gz -k 10 -l 12 --density 0.003 --minabund 2 --threads 10 --prefix std_sipp-k10 --bf`
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 4 }', src/read.rs:148:63
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 1 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 11 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 4 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 1 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 5 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 3 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 1 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 10 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: KSizeOutOfRange { ksize: 12, seq_size: 10 }', src/read.rs:148:63
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', /home/andre/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:213:73
    thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "SendError(..)"', /home/andre/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped_threadpool-0.1.9/src/lib.rs:219:72
    stack backtrace:
       0:     0x5591d2438050 - std::backtrace_rs::backtrace::libunwind::trace::h63b7a90188ab5fb3
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5
       1:     0x5591d2438050 - std::backtrace_rs::backtrace::trace_unsynchronized::h80aefbf9b851eca7
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
       2:     0x5591d2438050 - std::sys_common::backtrace::_print_fmt::hbef05ae4237a4d72
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:67:5
       3:     0x5591d2438050 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h28abce2fdb9884c2
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:46:22
       4:     0x5591d245670f - core::fmt::write::h3b84512577ca38a8
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/fmt/mod.rs:1092:17
       5:     0x5591d24352b2 - std::io::Write::write_fmt::h465f8feea02e2aa1
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/io/mod.rs:1572:15
       6:     0x5591d243a185 - std::sys_common::backtrace::_print::h525280ee0d29bdde
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:49:5
       7:     0x5591d243a185 - std::sys_common::backtrace::print::h1f0f5b9f3ef8fb78
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:36:9
       8:     0x5591d243a185 - std::panicking::default_hook::{{closure}}::ha5838f6faa4a5a8f
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:208:50
       9:     0x5591d2439c33 - std::panicking::default_hook::hfb9fe98acb0dcb3b
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:225:9
      10:     0x5591d243a78d - std::panicking::rust_panic_with_hook::hb89f5f19036e6af8
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:591:17
      11:     0x5591d243a327 - std::panicking::begin_panic_handler::{{closure}}::h119e7951427f41da
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:497:13
      12:     0x5591d243850c - std::sys_common::backtrace::__rust_end_short_backtrace::hce386c44bf47a128
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys_common/backtrace.rs:141:18
      13:     0x5591d243a289 - rust_begin_unwind
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:493:5
      14:     0x5591d2323341 - core::panicking::panic_fmt::h2242888e8769cd33
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/panicking.rs:92:14
      15:     0x5591d2323233 - core::option::expect_none_failed::hb1edf11f73e63728
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/option.rs:1329:5
      16:     0x5591d23c108f - scoped_threadpool::Scope::join_all::hd6132fc8a04c2f8d
      17:     0x5591d233fcbb - core::ops::function::FnOnce::call_once{{vtable.shim}}::h198262ef865dc7ad
      18:     0x5591d2391412 - std::sys_common::backtrace::__rust_begin_short_backtrace::he7799c2fe1d42088
      19:     0x5591d234f443 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hc12e7712db099355
      20:     0x5591d243d28a - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::hc444a77f8dd8d825
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546:9
      21:     0x5591d243d28a - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h8b68a0a9a2093dfc
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/alloc/src/boxed.rs:1546:9
      22:     0x5591d243d28a - std::sys::unix::thread::Thread::new::thread_start::hb95464447f61f48d
                                   at /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/sys/unix/thread.rs:71:17
      23:     0x7f00c05ae6ba - start_thread
      24:     0x7f00bfdd751d - clone
      25:                0x0 - <unknown>
    thread panicked while panicking. aborting.
    Command terminated by signal 4
    537.76user 35.81system 3:28.70elapsed 274%CPU (0avgtext+0avgdata 855256maxresident)k
    0inputs+1781280outputs (0major+10608356minor)pagefaults 0swaps
    + /usr/bin/time /home/andre/gfatools/gfatools asm std_sipp-k10.gfa -t 10,50000 -t 10,50000 -b 100000 -b 100000 -t 10,50000 -b 100000 -b 100000 -b 100000 -t 10,50000 -b 100000 -t 10,50000 -b 1000000 -t 10,150000 -b 1000000 -u
    ERROR: failed to read the graph
    Command exited with non-zero status 2
    0.00user 0.00system 0:00.00elapsed ?%CPU (0avgtext+0avgdata 1572maxresident)k
    0inputs+0outputs (0major+68minor)pagefaults 0swaps
    + python /home/andre/rust-mdbg/utils/gfa_break_loops.py std_sipp-k10.tmp1.gfa
    + [[ ! std_sipp-k10 == *--old-behavior* ]]
    + cargo run --manifest-path /home/andre/rust-mdbg/utils/../Cargo.toml --release --bin to_basespace -- --gfa std_sipp-k10.tmp2.gfa --sequences std_sipp-k10
        Finished release [optimized] target(s) in 0.07s
         Running `/home/andre/rust-mdbg/target/release/to_basespace --gfa std_sipp-k10.tmp2.gfa --sequences std_sipp-k10`
    + mv std_sipp-k10.tmp2.gfa.complete.gfa std_sipp-k10.tmp2.gfa
    + /usr/bin/time /home/andre/gfatools/gfatools asm std_sipp-k10.tmp2.gfa -t 10,50000 -b 100000 -t 10,100000 -b 1000000 -t 10,150000 -b 1000000 -u
    [M::main] Version: 0.4-r214-dirty
    [M::main] CMD: /home/andre/gfatools/gfatools asm -t 10,50000 -b 100000 -t 10,100000 -b 1000000 -t 10,150000 -b 1000000 -u std_sipp-k10.tmp2.gfa
    [M::main] Real time: 0.000 sec; CPU: 0.000 sec
    0.00user 0.00system 0:00.00elapsed 0%CPU (0avgtext+0avgdata 1752maxresident)k
    0inputs+0outputs (0major+74minor)pagefaults 0swaps
    ++ stat -c%s std_sipp-k10.tmp2.gfa
    + filesize=9
    + ((  filesize > 100000000 ))
    + mv std_sipp-k10.tmp3.gfa std_sipp-k10.msimpl.gfa
    + [[ std_sipp-k10 != *\-\-\k\e\e\p* ]]
    + rm -rf std_sipp-k10.tmp1.gfa std_sipp-k10.tmp2.gfa
    + bash /home/andre/rust-mdbg/utils/gfa2fasta.sh std_sipp-k10.msimpl
    2.19user 0.28system 0:02.50elapsed 99%CPU (0avgtext+0avgdata 24008maxresident)k
    0inputs+8outputs (0major+9763minor)pagefaults 0swaps
    
    bug 
    opened by GeoMicroSoares 4
  • Nanopore metagenome assembly parameters

    Nanopore metagenome assembly parameters

    Hi there,

    Congratulations, this tool seems amazing and I can't wait to use it with my data! Are there specific parameters that I can use/optimize with rust-mdbg to assemble Nanopore metagenomes?

    Thanks.

    enhancement 
    opened by GeoMicroSoares 4
  • Problems in ruinning `rust-mdbg` without graph simplifications

    Problems in ruinning `rust-mdbg` without graph simplifications

    Hi,

    I've installed rust-mdbg

    git clone --recursive https://github.com/ekimb/rust-mdbg.git
    cd rust-mdbg
    cargo build --release
    

    I've run it

    ~/git/rust-mdbg/target/release/rust-mdbg ~/git/rust-mdbg/example/reads-0.00.fa.gz -k 7 --threads 1 --density 0.0008 -l 10 --minabund 2 --prefix example
    ls example*
    
    example.140646999713344.sequences  example.gfa
    

    and finally tried both approaches to go in base-space

    gfatools asm -u  example.gfa > example.unitigs.gfa
    ~/git/rust-mdbg/target/release/to_basespace --gfa example.unitigs.gfa --sequences example.sequences
    
    [M::main] Version: 0.5-r250-dirty
    [M::main] CMD: gfatools asm -u example.gfa
    [M::main] Real time: 0.001 sec; CPU: 0.003 sec
    Done parsing unitigs GFA, got 1 unitigs.
    Done parsing original GFA, with 0 k-min-mers.
    Done parsing .sequences file, recorded 0 sequences.
    thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/to_basespace.rs:258:55
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
    

    and

    python3 ~/git/rust-mdbg/utils/complete_gfa.py example.*.sequences example.gfa
    
    Traceback (most recent call last):
      File "/home/guarracino/git/rust-mdbg/utils/complete_gfa.py", line 32, in <module>
        source_minims = node_minims[spl[1]]
    KeyError: '7'
    

    Am I doing silly errors somewhere?

    opened by AndreaGuarracino 3
  • Differences between uncompressed & compressed fastq files

    Differences between uncompressed & compressed fastq files

    This seems like a bug, but maybe I'm just misunderstanding something with how mdbg works.

    I discovered this after trying to run several human assemblies of varying input coverage (20x,30x,40x,50x) starting from hifi_reads.fq.gz files.

    The contiguity (n50) of all of the assemblies was in the same ballpark as the read n50 and there appeared to be no benefit to increased coverage. This coupled with the poor results in general had me scratching my head so I tried a different test dataset that was an uncompressed hifi_reads.fq and I got a great assembly.

    Curiosity piqued, I went back an unzipped the 20x coverage point I had tried earlier and got a much better assembly.

    See attached logs for logs from both the 20x assemblies starting from both hifi_reads.fq and hifi_reads.fq.gz

    Is this an actual bug, or is it just user error?

    hifi_reads_gzipped.log hifi_reads.log

    bug 
    opened by gconcepcion 3
  • magic_simplify crashes while running in Docker container on HPC cluster

    magic_simplify crashes while running in Docker container on HPC cluster

    Hey! I'm currently trying to run rust-mdbg as part of a fungi genome assembly pipeline using nextflow and docker containers on an HPC cluster and I'm running into these issues where the magic_simplify script crashes with os error 30 : read-only file system. I already checked the docker container and made sure the rust-mdbg dir is not read-only so I'm not sure what exactly is happening here. Maybe someone knows whats up? I'm using singularity to run the docker containers on the HPC cluster I just hope this is not some compatibility issue with singularity/rust..

    command.log

    bug 
    opened by fischer-hub 3
Releases(v1.0.1)
Owner
Barış Ekim
PhD student in Berger Group at @mit.
Barış Ekim
Unified file system operation experience for different backend

megfile - Megvii FILE library Docs: http://megvii-research.github.io/megfile megfile provides a silky operation experience with different backends (cu

MEGVII Research 76 Dec 14, 2022
A minimalist environment for decision-making in autonomous driving

highway-env A collection of environments for autonomous driving and tactical decision-making tasks An episode of one of the environments available in

Edouard Leurent 1.6k Jan 07, 2023
A Collection of LiDAR-Camera-Calibration Papers, Toolboxes and Notes

A Collection of LiDAR-Camera-Calibration Papers, Toolboxes and Notes

443 Jan 06, 2023
Just-Now - This Is Just Now Login Friendlist Cloner Tools

JUST NOW LOGIN FRIENDLIST CLONER TOOLS Install $ apt update $ apt upgrade $ apt

MAHADI HASAN AFRIDI 21 Mar 09, 2022
Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CAC) Xin Lai*, Zhuotao Tian*, Li Jiang, Shu Liu, Hengshuang Zhao, Li

DV Lab 137 Dec 14, 2022
Tensors and neural networks in Haskell

Hasktorch Hasktorch is a library for tensors and neural networks in Haskell. It is an independent open source community project which leverages the co

hasktorch 920 Jan 04, 2023
RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering

RNG-KBQA: Generation Augmented Iterative Ranking for Knowledge Base Question Answering Authors: Xi Ye, Semih Yavuz, Kazuma Hashimoto, Yingbo Zhou and

Salesforce 72 Dec 05, 2022
Defending against Model Stealing via Verifying Embedded External Features

Defending against Model Stealing Attacks via Verifying Embedded External Features This is the official implementation of our paper Defending against M

20 Dec 30, 2022
Alignment Attention Fusion framework for Few-Shot Object Detection

AAF framework Framework generalities This repository contains the code of the AAF framework proposed in this paper. The main idea behind this work is

Pierre Le Jeune 20 Dec 16, 2022
Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19 (Oral).

Pose-Transfer Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19(Oral). The paper is available here. Video generation

Tengteng Huang 679 Jan 04, 2023
Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

[ICLR 2022] Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity by Shiwei Liu, Tianlong Chen, Zahra Atashgahi, Xiaohan Chen, Ghada Sokar, Elen

VITA 18 Dec 31, 2022
A python program to hack instagram

hackinsta a program to hack instagram Yokoback_(instahack) is the file to open, you need libraries write on import. You run that file in the same fold

2 Jan 22, 2022
Learning Continuous Image Representation with Local Implicit Image Function

LIIF This repository contains the official implementation for LIIF introduced in the following paper: Learning Continuous Image Representation with Lo

Yinbo Chen 1k Dec 25, 2022
Official PyTorch implementation of SyntaSpeech (IJCAI 2022)

SyntaSpeech: Syntax-Aware Generative Adversarial Text-to-Speech | | | | 中文文档 This repository is the official PyTorch implementation of our IJCAI-2022

Zhenhui YE 116 Nov 24, 2022
Lucid Sonic Dreams syncs GAN-generated visuals to music.

Lucid Sonic Dreams Lucid Sonic Dreams syncs GAN-generated visuals to music. By default, it uses NVLabs StyleGAN2, with pre-trained models lifted from

731 Jan 02, 2023
Unofficial implementation of Fast-SCNN: Fast Semantic Segmentation Network

Fast-SCNN: Fast Semantic Segmentation Network Unofficial implementation of the model architecture of Fast-SCNN. Real-time Semantic Segmentation and mo

Philip Popien 69 Aug 11, 2022
(CVPR 2022 Oral) Official implementation for "Surface Representation for Point Clouds"

RepSurf - Surface Representation for Point Clouds [CVPR 2022 Oral] By Haoxi Ran* , Jun Liu, Chengjie Wang ( * : corresponding contact) The pytorch off

Haoxi Ran 264 Dec 23, 2022
End-to-end machine learning project for rices detection

Basmatinet Welcome to this project folks ! Whether you like it or not this project is all about riiiiice or riz in french. It is also about Deep Learn

Béranger 47 Jun 18, 2022
Learning Optical Flow from a Few Matches (CVPR 2021)

Learning Optical Flow from a Few Matches This repository contains the source code for our paper: Learning Optical Flow from a Few Matches CVPR 2021 Sh

Shihao Jiang (Zac) 159 Dec 16, 2022
Semantic Segmentation Architectures Implemented in PyTorch

pytorch-semseg Semantic Segmentation Algorithms Implemented in PyTorch This repository aims at mirroring popular semantic segmentation architectures i

Meet Shah 3.3k Dec 29, 2022