Pipeline to convert a haploid assembly into diploid

Related tags

Data Analysishapdup
Overview

HapDup

HapDup (haplotype duplicator) is a pipeline to convert a haploid long read assembly into a dual diploid assembly. The reconstructed haplotypes preserve heterozygous structural variants (in addition to small variants) and are locally phased.

Version 0.4

Input requirements

HapDup takes as input a haploid long-read assembly, such as produced with Flye or Shasta. Currenty, ONT reads (Guppy 5+ recommended) and PacBio HiFi reads are supported.

HapDup is currently designed for low-heterozygosity genomes (such as human). The expectation is that the assembly has most of the diploid genome collapsed into a single haplotype. For assemblies with partially resolved haplotypes, alternative alleles could be removed prior to running the pipeline using purge_dups. We expect to add a better support of highly heterozygous genomes in the future.

The first stage is to realign the original long reads on the assembly using minimap2. We recommend to use the latest minimap2 release.

minimap2 -ax map-ont -t 30 assembly.fasta reads.fastq | samtools sort -@ 4 -m 4G > lr_mapping.bam
samtools index -@ 4 assembly_lr_mapping.bam

Quick start using Docker

HapDup is available on the Docker Hub.

If Docker is not installed in your system, you need to set it up first following this guide.

Next steps assume that your assembly.fasta and lr_mapping.bam are in the same directory, which will also be used for HapDup output. If it is not the case, you might need to bind additional directories using the Docker's -v / --volume argument. The number of threads (-t argument) should be adjusted according to the available resources. For PacBio HiFi input, use --rtype hifi instead of --rtype ont.

cd directory_with_assembly_and_alignment
HD_DIR=`pwd`
docker run -v $HD_DIR:$HD_DIR -u `id -u`:`id -g` mkolmogo/hapdup:0.4 \
  hapdup --assembly $HD_DIR/assembly.fasta --bam $HD_DIR/lr_mapping.bam --out-dir $HD_DIR/hapdup -t 64 --rtype ont

Quick start using Singularity

Alternatively, you can use Singularity. First, you will need install the client as descibed in the manual. One way to do it is through conda:

conda install singularity

Next steps assume that your assembly.fasta and lr_mapping.bam are in the same directory, which will also be used for HapDup output. If it is not the case, you might need to bind additional directories using the --bind argument. The number of threads (-t argument) should be adjusted according to the available resources. For PacBio HiFi input, use --rtype hifi instead of --rtype ont.

singularity pull docker://mkolmogo/hapdup:0.4
HD_DIR=`pwd`
singularity exec --bind $HD_DIR hapdup_0.4.sif \
  hapdup --assembly $HD_DIR/assembly.fasta --bam $HD_DIR/lr_mapping.bam --out-dir $HD_DIR/hapdup -t 64 --rtype ont

Output files

The output directory will contain:

  • haplotype_{1,2}.fasta - final assembled haplotypes
  • phased_blocks_hp{1,2}.bed - phased blocks coordinates

Haplotypes generated by the pipeline contain homozogous and heterozygous varinats (small and structural). Becuase the pipeline is only using long-read (ONT) data, it does not achieve chromosome-level phasing. Fully-phased blocks are given in the the phased_blocks* files.

Pipeline overview

  1. HapDup starts with filtering alignments that are likely originating from the unassembled parts of the genome. Such alignments may later create false haplotypes if not removed (e.g. if reads from a segmental duplication with two copies can create four haplotypes).

  2. Afterwards, PEPPER is used to call SNPs from the filtered alignment file

  3. Then we use Margin to phase SNPs and haplotype reads

  4. We then use Flye to polish the initiall assembly with the reads from each of the two haplotypes independently

  5. Finally, we find (heterozygous) breakpoints in long-read alignments and apply the corresponding structural changes to the corresponding polished haplotypes. Currently, it allows to recover large heterozygous inversions.

Benchmarks

We evaluated HapDup haplotypes in terms of reconstructed structural variants signatures (heterozygous & homozygous) using the HG002 for which the curated set of SVs is available. We used the recent ONT data basecalled with Guppy 5.

Given HapDup haplotypes, we called SV using dipdiff. We also compare SV set against hifiasm assemblies, even though they were produced from HiFi, rather than ONT reads. Evaluated using truvari with -r 2000 option. GT refers to genotype-considered benchmarks.

Method Precision Recall F1-score GT Precision GT Recall GT F1-score
Shasta+HapDup 0.9500 0.9551 0.9525 0.934 0.9543 0.9405
Sniffles 0.9294 0.9143 0.9219 0.8284 0.9051 0.8605
CuteSV 0.9324 0.9428 0.9376 0.9119 0.9416 0.9265
hifiasm 0.9512 0.9734 0.9622 0.9129 0.9723 0.9417

Yak k-mer based evaluations:

Hap QV Switch err Hamming err
1 35 0.0389 0.1862
2 35 0.0385 0.1845

Given a minimap2 alignment, HapDup runs in ~400 CPUh and uses ~80 Gb of RAM.

Source installation

If you prefer, you can install from source as follows:

#create a new conda environemnt and activate it
conda create -n hapdup python=3.8
conda activate hapdup

#get HapDup source
git clone https://github.com/fenderglass/hapdup
cd hapdup
git submodule update --init --recursive

#build and install Flye
pushd submodules/Flye/ && python setup.py install && popd

#build and install Margin
pushd submodules/margin/ && mkdir build && cd build && cmake .. && make && cp ./margin $CONDA_PREFIX/bin/ && popd

#build and install PEPPER and its dependencies
pushd submodules/pepper/ && python -m pip install . && popd

To run, ensure that the conda environemnt is activated and then execute:

conda activate hapdup
./hapdup.py --assembly assembly.fasta --bam lr_mapping.bam --out-dir hapdup -t 64 --rtype ont

Acknowledgements

The major parts of the HapDup pipeline are:

Authors

The pipeline was developed at UC Santa Cruz genomics institute, Benedict Paten's lab.

Pipeline code contributors:

  • Mikhail Kolmogorov

PEPPER/Margin/Shasta support:

  • Kishwar Shafin
  • Trevor Pesout
  • Paolo Carnevali

Citation

If you use HapDup in your research, the most relevant papers to cite are:

Kishwar Shafin, Trevor Pesout, Pi-Chuan Chang, Maria Nattestad, Alexey Kolesnikov, Sidharth Goel, Gunjan Baid et al. "Haplotype-aware variant calling enables high accuracy in nanopore long-reads using deep neural networks." bioRxiv (2021). doi:10.1101/2021.03.04.433952

Mikhail Kolmogorov, Jeffrey Yuan, Yu Lin and Pavel Pevzner, "Assembly of Long Error-Prone Reads Using Repeat Graphs", Nature Biotechnology, 2019 doi:10.1038/s41587-019-0072-8

License

HapDup is distributed under a BSD license. See the LICENSE file for details. Other software included in this discrubution is released under either MIT or BSD licenses.

How to get help

A preferred way report any problems or ask questions is the issue tracker.

In case you prefer personal communication, please contact Mikhail at [email protected].

Comments
  • pthread_setaffinity_np failed Error while running pepper

    pthread_setaffinity_np failed Error while running pepper

    Hi,

    I'm trying to run HapDup on my assembly from Flye. However, an error has occurred while running Pepper: RuntimeError: /onnxruntime_src/onnxruntime/core/platform/posix/env.cc:173 onnxruntime::{anonymous}::PosixThread::PosixThread(const char*, int, unsigned int (*)(int, Eigen::ThreadPoolInterface*), Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed, error code: 0 error msg: there's also a warning before this runtime error: /usr/local/lib/python3.8/dist-packages/torch/onnx/symbolic_opset9.py:2095: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with LSTM can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model. warnings.warn("Exporting a model to ONNX with a batch_size other than 1, " + Do you have any idea why this happens?

    The commands that I use are like:

    reads=NA24385_ONT_Promethion.fastq
    outdir=`pwd`
    assembly=${outdir}/assembly.fasta
    hapdup_sif=../HapDup/hapdup_0.4.sif
    
    time minimap2 -ax map-ont -t 30 ${assembly} ${reads} | samtools sort -@ 4 -m 4G > assembly_lr_mapping.bam
    samtools index -@ 4 assembly_lr_mapping.bam
    
    time singularity exec --bind ${outdir} ${hapdup_sif}\
    	hapdup --assembly ${assembly} --bam ${outdir}/assembly_lr_mapping.bam --out-dir ${outdir}/hapdup -t 64 --rtype ont
    

    Thank you

    bug 
    opened by LYC-vio 11
  • Docker fails to mount HD_DIR

    Docker fails to mount HD_DIR

    Hi! The following error occurs when I try to run:

    sudo docker run -v $HD_DIR:$HD_DIR -u `id -u`:`id -g` mkolmogo/hapdup:0.2   hapdup --assembly $HD_DIR/barcode11CBS1878_v2.fasta --bam $HD_DIR/lr_mapping.bam --out-dir $HD_DIR/hapdup
    
    docker: Error response from daemon: error while creating mount source path '/home/user/data/volume_2/hapdup_results': mkdir /home/user/data: file exists.
    ERRO[0000] error waiting for container: context canceled``
    
    opened by alkminion1 8
  • Incorrect genotype for large deletion

    Incorrect genotype for large deletion

    Hi,

    I have used Hapdup to make a haplotype-resolved assembly from Illumina-corrected ONT reads (haploid assembly made with Flye 2.9) and I am particularly interested in a large 32kb deletion. Here is a screenshot of IGV (from top to bottom: HAP1, HAP2 and haploid assembly):

    hapdup

    I believe the position and size of the deletion are near correct. However, the deletion is homozygous while it should be heterozygous. I have assembled with Hifiasm this proband and its parents using 30x PacBio HiFi: the 3 assemblies support an heterozygous call in the proband. I can also see from the corrected ONT that there is support for a heterozygous call. Finally, we can see this additional contig in the haploid assembly which I guess also support a heterozygous call.

    Hence, my question is: even if MARGIN manages to correctly separate reads with the deletion from reads without the deletion, can the polishing of Flye actually "fix" such a large event in one of the haplotype assembly?

    Thanks, Guillaume

    opened by GuillaumeHolley 7
  • invalid contig

    invalid contig

    Hi, I got an error when I ran the third step, and here is the error reporting information. Skipped filtering phase Skipped pepper phase Skipped margin phase Skipped Flye phase Finding breakpoints Parsed 304552 reads 14590 split reads Running: flye-minimap2 -ax asm5 -t 64 -K 5G /usr_storage/zyl/SY_haplotype/ZSP192L/ZSP192L.fasta /usr_storage/zyl/SY_haplotype/ZSP192L/hapdup/flye_hap_1/polished_1.fasta 2>/dev/null | flye-samtools sort -m 4G [email protected] > /usr_storage/zyl/SY_haplotype/ZSP192L/hapdup/structural/liftover_hp1.bam [bam_sort_core] merging from 0 files and 4 in-memory blocks... Traceback (most recent call last): File "/usr/local/bin/hapdup", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/hapdup/main.py", line 173, in main bed_liftover(inversions_bed, minimap_out, open(inversions_hp, "w")) File "/usr/local/lib/python3.8/dist-packages/hapdup/bed_liftover.py", line 76, in bed_liftover proj_start_chr, proj_start_pos, proj_start_sign = project(bam_file, chr_id, chr_start) File "/usr/local/lib/python3.8/dist-packages/hapdup/bed_liftover.py", line 9, in project name, pos, sign = project_flank(bam_path, ref_seq, ref_pos, 1) File "/usr/local/lib/python3.8/dist-packages/hapdup/bed_liftover.py", line 23, in project_flank for pileup_col in samfile.pileup(ref_seq, max(0, ref_pos - flank), ref_pos + flank, truncate=True, File "pysam/libcalignmentfile.pyx", line 1335, in pysam.libcalignmentfile.AlignmentFile.pileup File "pysam/libchtslib.pyx", line 685, in pysam.libchtslib.HTSFile.parse_region ValueError: invalid contig Contig125

    bug 
    opened by tongyin121 7
  • hapdup fails with multiple primary alignments

    hapdup fails with multiple primary alignments

    I've got one sample in a series which is failing in hapdup with the following error.

    Any suggestions?

    Thanks

    `

    Starting merge Expected three tokens in header line, got 2 This usually means you have multiple primary alignments with the same read ID. You can identify whether this is the case with this command:

        samtools view -F 0x904 YOUR.bam | cut -f 1 | sort | uniq -c | awk '$1 > 1'
    

    Expected three tokens in header line, got 2 This usually means you have multiple primary alignments with the same read ID. You can identify whether this is the case with this command:

        samtools view -F 0x904 YOUR.bam | cut -f 1 | sort | uniq -c | awk '$1 > 1'
    

    [2022-12-13 17:15:57] ERROR: Missing output: hapdup/margin/MARGIN_PHASED.haplotagged.bam Traceback (most recent call last): File "/data/test_data/GIT/hapdup/hapdup.py", line 24, in sys.exit(main()) File "/data/test_data/GIT/hapdup/hapdup/main.py", line 206, in main file_check(haplotagged_bam) File "/data/test_data/GIT/hapdup/hapdup/main.py", line 114, in file_check raise Exception("Missing output") Exception: Missing output `

    opened by mattloose 4
  • ZeroDivisionError

    ZeroDivisionError

    Hi,

    I'm running hapdup version 0.8 on a number of human genomes.

    It appears to fail fairly regulary with an error in flye:

    [2022-10-20 15:33:48] INFO: Running Flye polisher [2022-10-20 15:33:48] INFO: Polishing genome (1/1) [2022-10-20 15:33:48] INFO: Polishing with provided bam [2022-10-20 15:33:48] INFO: Separating alignment into bubbles [2022-10-20 15:37:12] ERROR: Thread exception [2022-10-20 15:37:12] ERROR: Traceback (most recent call last): File "/home/plzmwl/anaconda3/envs/hapdup/lib/python3.8/site-packages/flye/polishing/bubbles.py", line 79, in _thread_worker indels_profile = _get_indel_clusters(ctg_aln, profile, ctg_region.start) File "/home/plzmwl/anaconda3/envs/hapdup/lib/python3.8/site-packages/flye/polishing/bubbles.py", line 419, in _get_indel_clusters get_clusters(deletions, add_last=True) File "/home/plzmwl/anaconda3/envs/hapdup/lib/python3.8/site-packages/flye/polishing/bubbles.py", line 410, in get_clusters support = len(reads) / region_coverage ZeroDivisionError: division by zero

    The flye version is 2.9-b1778

    Has anyone else seen this and have any suggestions on how to fix?

    Thanks.

    bug 
    opened by mattloose 4
  • Super cool tool! Maybe in the README mention that FASTQ files are required if using the container

    Super cool tool! Maybe in the README mention that FASTQ files are required if using the container

    As far as know, inputting FASTA reads aligned to the reference as the bam file will result in Pepper failing to find any variants using the default settings of the Docker/Singularity container as the config for Pepper requires a minimum base quality setting.

    opened by jelber2 3
  • Option to set minimap2 -I flag

    Option to set minimap2 -I flag

    Hi,

    Ran into this error while trying to run hapdup:

    [2022-03-08 10:23:36] INFO: Running: flye-minimap2 -ax asm5 -t 10 -K 5G <PATH>/assembly.fasta <PATH>/hapdup/flye_hap_1/polished_1.fasta 2>/dev/null | flye-samtools sort -m 4G [email protected] > <PATH>/hapdup/structural/liftover_hp1.bam
    [E::sam_parse1] missing SAM header
    [W::sam_read1] Parse error at line 2
    samtools sort: truncated file. Aborting
    Traceback (most recent call last):
      File "/usr/local/bin/hapdup", line 8, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.8/dist-packages/hapdup/main.py", line 245, in main
        subprocess.check_call(" ".join(minimap_cmd), shell=True)
      File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command 'flye-minimap2 -ax asm5 -t 10 -K 5G <PATH>/assembly.fasta <PATH>/hapdup/flye_hap_1/polished_1.fasta 2>/dev/null | flye-samtools sort -m 4G -@4 > <PATH>/hapdup/structural/liftover_hp1.bam' returned non-zero exit status 1.
    

    I suspect it could be because of the default minimap2 -I flag being too small (4G)? If this is the case, maybe an option to specify this could be added, or adjust it automatically depending on genome size?

    Thanks!

    bug 
    opened by fellen31 3
  • Using Pepper-MARGIN r0.7?

    Using Pepper-MARGIN r0.7?

    Hi,

    Would it be possible to include the latest version of Pepper-MARGIN (r0.7) in hapdup? I haven't been able to run hapdup on my data so far because of some issues in Pepper-MARGIN r0.6 (now solved in r0.7).

    Thank you!

    Guillaume

    enhancement 
    opened by GuillaumeHolley 3
  • Please add CLI option to specify location of BAM index

    Please add CLI option to specify location of BAM index

    Could you add an additional command line parameter, allowing the specification of the BAM index location? Eg,

    hapdup --bam /some/where/abam.bam --bam-index /other/location/a_bam_index.bam.csi

    And then pass that optional value to pysam.AlignmentFile() argument filepath_index?

    Motivation: I'm wrapping hapdup and some other steps in a WDL script, and need to pass each file separately (ie, they are localized as individual files, and there is no guarantee they end up in the same directory when hapdup is invoked). The current hapdup assumes the index and bam are in the same directory, and fails.

    Thanks!

    CC: @0seastar0

    enhancement 
    opened by bkmartinjr 3
  • Singularity

    Singularity

    Hi,

    Thank you for this tool, I am really excited to try it! Would it be possible to have hapdup available as a Singularity image or to have the Docker image available online in a container repo (such that it can be converted to a Singularity image with a singularity pull)?

    Thanks, Guillaume

    opened by GuillaumeHolley 3
  • --overwrite fails

    --overwrite fails

    [2022-11-03 20:16:22] INFO: Filtering alignments Traceback (most recent call last): File "/data/test_data/GIT/hapdup/hapdup.py", line 24, in sys.exit(main()) File "/data/test_data/GIT/hapdup/hapdup/main.py", line 153, in main filter_alignments_parallel(args.bam, filtered_bam, min(args.threads, 30), File "/data/test_data/GIT/hapdup/hapdup/filter_misplaced_alignments.py", line 188, in filter_alignments_parallel pysam.merge("-@", str(num_threads), bam_out, *bams_to_merge) File "/home/plzmwl/anaconda3/envs/hapdup/lib/python3.8/site-packages/pysam/utils.py", line 69, in call raise SamtoolsError( pysam.utils.SamtoolsError: "samtools returned with error 1: stdout=, stderr=[bam_merge] File 'hapdup/filtered.bam' exists. Please apply '-f' to overwrite. Abort.\n"

    Looks as though when you run with --overwrite the command is not being correctly passed through to sub processes.

    bug 
    opened by mattloose 2
  • phase block number compared to Whatshap

    phase block number compared to Whatshap

    Hello, thank you for the great tool!

    I was just testing HapDup v0.7 on our fish genome. Comparing the output with phasing done with WhatsHap (WH), I wondered why there is such a big difference in phased block size and block number between HapDup and the WH pipeline?

    For the fish chromosomes, WH was generating 679 blocks using 2'689'114 phased SNPs. Margin (HapDup pipeline) was generating 5352 blocks using 3'862'108 phased SNPs.

    The main difference seems to be the prior read filtering and usage of MarginPhase for the phasing in HapDup, but does this explain such a big difference?

    I was wondering if phase blocks of HapDup could be concatenated using whatshap SNP and block information to increase continuity? I imagine it would be a straightforward approach overlapping SNP positions between Margin and WH with phase block ids and lift-over phase ids from WH. I will do some visual inspections and scripting to test if there is overlap of called SNPs and agreement on block boarders.

    Cheers, Michel

    opened by MichelMoser 3
Releases(0.10)
  • 0.10(Oct 29, 2022)

  • 0.6(Mar 4, 2022)

    • Fixed issue with PEPPER model quantization causing the pipeline to hang on some systems
    • Speed-up of the last breakpoint analysis part of the pipeline, causing bottlenecks on some datasets
    Source code(tar.gz)
    Source code(zip)
  • 0.5(Feb 6, 2022)

    • Update to PEPPER 0.7
    • Added new option --pepper-model for custom pepper models
    • Added new option --bam-index to provide a non-standard path to alignment index file
    Source code(tar.gz)
    Source code(zip)
  • 0.4(Nov 19, 2021)

Owner
Mikhail Kolmogorov
Postdoc @ UCSC CGL, Paten lab. I work on building algorithms for computational genomics.
Mikhail Kolmogorov
Reading streams of Twitter data, save them to Kafka, then process with Kafka Stream API and Spark Streaming

Using Streaming Twitter Data with Kafka and Spark Reading streams of Twitter data, publishing them to Kafka topic, process message using Kafka Stream

Rustam Zokirov 1 Dec 06, 2021
PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams

PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams Motivation When dataset freshness is critical, the annotating of high speed

4 Aug 02, 2022
CS50 pset9: Using flask API to create a web application to exchange stocks' shares.

C$50 Finance In this guide we want to implement a website via which users can “register”, “login” “buy” and “sell” stocks, like below: Background If y

1 Jan 24, 2022
PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift This project is composed of two parts: Part1 and Part2

Emmanuel Boateng Sifah 1 Jan 19, 2022
A Python and R autograding solution

Otter-Grader Otter Grader is a light-weight, modular open-source autograder developed by the Data Science Education Program at UC Berkeley. It is desi

Infrastructure Team 93 Jan 03, 2023
The official pytorch implementation of ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias

ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias Introduction | Updates | Usage | Results&Pretrained Models | Statement | Intr

104 Nov 27, 2022
Fancy data functions that will make your life as a data scientist easier.

WhiteBox Utilities Toolkit: Tools to make your life easier Fancy data functions that will make your life as a data scientist easier. Installing To ins

WhiteBox 3 Oct 03, 2022
Analyse the limit order book in seconds. Zoom to tick level or get yourself an overview of the trading day.

Analyse the limit order book in seconds. Zoom to tick level or get yourself an overview of the trading day. Correlate the market activity with the Apple Keynote presentations.

2 Jan 04, 2022
A probabilistic programming library for Bayesian deep learning, generative models, based on Tensorflow

ZhuSuan is a Python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and

Tsinghua Machine Learning Group 2.2k Dec 28, 2022
Analyze the Gravitational wave data stored at LIGO/VIRGO observatories

Gravitational-Wave-Analysis This project showcases how to analyze the Gravitational wave data stored at LIGO/VIRGO observatories, using Python program

1 Jan 23, 2022
A forecasting system dedicated to smart city data

smart-city-predictions System prognostyczny dedykowany dla danych inteligentnych miast Praca inżynierska realizowana przez Michała Stawikowskiego and

Kevin Lai 1 Nov 08, 2021
MoRecon - A tool for reconstructing missing frames in motion capture data.

MoRecon - A tool for reconstructing missing frames in motion capture data.

Yuki Nishidate 38 Dec 03, 2022
Methylation/modified base calling separated from basecalling.

Remora Methylation/modified base calling separated from basecalling. Remora primarily provides an API to call modified bases for basecaller programs s

Oxford Nanopore Technologies 72 Jan 05, 2023
VHub - An API that permits uploading of vulnerability datasets and return of the serialized data

VHub - An API that permits uploading of vulnerability datasets and return of the serialized data

André Rodrigues 2 Feb 14, 2022
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 09, 2023
Learn machine learning the fun way, with Oracle and RedBull Racing

Red Bull Racing Analytics Hands-On Labs Introduction Are you interested in learning machine learning (ML)? How about doing this in the context of the

Oracle DevRel 55 Oct 24, 2022
A collection of learning outcomes data analysis using Python and SQL, from DQLab.

Data Analyst with PYTHON Data Analyst berperan dalam menghasilkan analisa data serta mempresentasikan insight untuk membantu proses pengambilan keputu

6 Oct 11, 2022
Toolchest provides APIs for scientific and bioinformatic data analysis.

Toolchest Python Client Toolchest provides APIs for scientific and bioinformatic data analysis. It allows you to abstract away the costliness of runni

Toolchest 11 Jun 30, 2022
collect training and calibration data for gaze tracking

Collect Training and Calibration Data for Gaze Tracking This tool allows collecting gaze data necessary for personal calibration or training of eye-tr

Pascal 5 Dec 17, 2022
songplays datamart provide details about the musical taste of our customers and can help us to improve our recomendation system

Songplays User activity datamart The following document describes the model used to build the songplays datamart table and the respective ETL process.

Leandro Kellermann de Oliveira 1 Jul 13, 2021