Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
Bringing Ethereum Virtual Machine to StarkNet at warp speed!

Warp Warp brings EVM compatible languages to StarkNet, making it possible to transpile Ethereum smart contracts to Cairo, and use them on StarkNet. Ta

Nethermind 700 Dec 26, 2022
🚧 finCLI's own News API. No more limited API calls. Unlimited credible and latest information on BTC, Ethereum, Indian and Global Finance.

🚧 finCLI's own News API. No more limited API calls. Unlimited credible and latest information on BTC, Ethereum, Indian and Global Finance.

finCLI 5 Jun 16, 2022
a Disqus alternative

Isso – a commenting server similar to Disqus Isso – Ich schrei sonst – is a lightweight commenting server written in Python and JavaScript. It aims to

Martin Zimmermann 4.7k Jan 02, 2023
quote is a python wrapper for the Goodreads Quote API, powered by gazpacho.

About quote is a python wrapper for the Goodreads Quote API, powered by gazpacho.

Max Humber 11 Nov 10, 2022
A modular bot running on python3 with anime theme and have a lot features

STINKY ROBOT Emiko Robot is a modular bot running on python3 with anime theme and have a lot features. Easiest Way To Deploy On Heroku This Bot is Cre

Riyan.rz 3 Jan 21, 2022
An NFTGenerator to generate NFTs and send them to nft.storage

NFTGenerator Table of Contents Overview Installation Introduction Features Reflection Issues & bug reports Show your support Credits Overview The NFTG

3 Mar 14, 2022
Rotten Tomatoes API for Python

rottentomatoes.py rottentomatoes offers an easy-to-use Python wrapper to interact with the Rotten Tomatoes API. Before you try and use the API, make s

Zach Williams 88 Dec 15, 2022
A Discord bot that may save your day by predicting it.

Sage A Discord bot that may save your day by predicting it.

1 Nov 17, 2022
Personal Discord Python Bot based on Discord.py

Personal Discord bot using the discord.py library by Rapptz

2 Dec 14, 2022
A hyper-user friendly bot framework built on hikari

Framework A hyper-user friendly bot framework built on hikari. Framework is based off the blocking discord library disco, In both modularity and struc

Vincent 1 Jan 10, 2022
This Instagram app created as a clone of instagram.Developed during Moringa Core.

Instagram This Instagram app created as a clone of instagram.Developed during Moringa Core. AUTHOR By: Nyagah Isaac Description This web-app allows a

Nyagah Isaac 1 Nov 01, 2021
The elegance of Airflow + the power of AWS

Orkestra The elegance of Airflow + the power of AWS

Stephan Fitzpatrick 42 Nov 01, 2022
Signs API calls to SberCloud.Advanced with AK/SK

sbercloud-api-aksk Signs API calls to SberCloud.Advanced with AK/SK This script is a courtesy of @sadpdtchr Description Sometimes there is a need to m

Peter Predtechensky 1 Nov 30, 2021
🔎 Hunt down social media accounts by username across social networks

Hunt down social media accounts by username across social networks Installation | Usage | Docker Notes | Contributing Installation # clone the repo $

Sherlock 38.2k Jan 01, 2023
A Script to automate fowarding all new messages from one/many channel(s) to another channel(s), without the forwarded tag.

Channel Auto Message Post A script to automate fowarding all new messages from one/many channel(s) to another channel(s), without the forwarded tag. C

16 Oct 21, 2022
Neofetch/pfetch, but for weather

Wfetch Neofetch/pfetch, but for weather Features Information about the weather outside: Weather condition Temperature Humidity Pressure Wind Sunrise-s

G_cat 72 Nov 18, 2022
Fix Twitter video embeds in Discord

TwitFix very basic flask server that fixes twitter embeds in discord by using youtube-dl to grab the direct link to the MP4 file and embeds the link t

Robin Universe 682 Dec 28, 2022
AutomaTik is an automation system for MikroTik devices with simplicity and security in mind.

AutomaTik Installation AutomaTik is an automation system for MikroTik devices with simplicity and security in mind. Winbox is the main tool for MikroT

Osman Kazdal 4 Dec 05, 2022
Diablo II Resurrected helper

Diablo II Resurrected 快捷施法辅助 功能: + 创建守护进程,注册全局热键 alt+/ 启用和关闭功能 (todo: 播放声音提示) + 按 x 强制移动 + 按 1 ~ 0 快捷施法到鼠标区域 使用 编辑配置 settings.py 技能信息做如下定义: SKILLS:

Wan 2 Nov 06, 2022
Automatically send commands to send Twitch followers to any Twitch account.

Automatically send commands to send Twitch followers to any Twitch account. You just need to be in a Twitch follow bot Discord server!

Thomas Keig 6 Nov 27, 2022