Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
Lumberjack-bot - A game bot written for Lumberjack game at Telegram platform

This is a game bot written for Lumberjack game at Telegram platform. It is devel

Uğur Uysal 6 Apr 07, 2022
A Telegram Calculator to calculate your maths sums

CalculatorBot A Telegram Calculator to calculate your maths sums! Made by /Team

TeamOctave 2 Dec 31, 2021
Source code of BobuxAdmin bot from Bobux Bot Development server.

BobuxAdmin Source code of BobuxAdmin bot from Bobux Bot Development server. The bot is written with usage of disnake and SQLite database. Functionalit

Bobux Bot Developers 3 Dec 29, 2022
This project uses Youtube data API's to do youtube tags analysis based on viewCount, comments etc.

Youtube video details analyser Steps to run this project Please set the AuthKey which you can fetch from google developer console and paste it in the

1 Nov 21, 2021
Ubuntu env build; Nginx build; DB build;

Deploy 介绍 Deploy related scripts bitnami Dependencies Ubuntu openssl envsubst docker v18.06.3 docker-compose init base env upload https://gitlab-runn

Colin(liuji) 10 Dec 01, 2021
An iCal file to transport you to a new place every day until you die

everydayvirtualvacation An iCal file to transport you to a new place every day until you die The library is closed 😔 😔 including a video of the plac

Jacob Chapman 33 Apr 19, 2022
:lock: Python 2.7/3.X client for HashiCorp Vault

hvac HashiCorp Vault API client for Python 3.x Tested against the latest release, HEAD ref, and 3 previous minor versions (counting back from the late

hvac 1k Dec 29, 2022
Red-mail - Advanced email sending library for Python

Red Mail Next generation email sender What is it? Red Mail is an advanced email

Mikael Koli 313 Jan 08, 2023
Azure Neural Speech Service TTS

Written in Python using the Azure Speech SDK. App.py provides an easy way to create an Text-To-Speech request to Azure Speech and download the wav file.

Rodney 1 Oct 11, 2021
Easy-apply-bot - A LinkedIn Easy Apply bot to help with my job search.

easy-apply-bot A LinkedIn Easy Apply bot to help with my job search. Getting Started First, clone the repository somewhere onto your computer, or down

Matthew Alunni 5 Dec 09, 2022
A light wrapper around FedEx's SOAP API.

Python FedEx SOAP API Module Author: Greg Taylor, Radek Wojcik Maintainer: Python FedEx Developers License: BSD Status: Stable What is it? A light wra

155 Dec 16, 2022
Lazy airdrop based on private temporary ids

LobsterDAO This uses a modified MerkleDistributor, which allows to issue a lazy airdrop using temporary IDs. In this example it uses Telegram chat_id

41 Sep 10, 2022
A lightweight, dependency-free Python library (and command-line utility) for downloading YouTube Videos.

24 July 2020 Actively soliciting contributers! Ping @ronncc if you would like to help out! pytube pytube is a very serious, lightweight, dependency-fr

pytube 7.9k Jan 02, 2023
Official API documentation for Highrise

Highrise API The Highrise API is implemented as vanilla XML over HTTP using all four verbs (GET/POST/PUT/DELETE). Every resource, like Person, Deal, o

Basecamp 128 Dec 06, 2022
An example of matrix addition, demonstrating the basic method of Python calling C library functions

Example for Python call C functions An example of matrix addition, demonstrating the basic method of Python calling C library functions. How to run Bu

Quantum LIu 2 Dec 21, 2021
Trellox Tool is written in Python3 and designed to pull and list Trello boards.

TrelloX Trellox Tool is written in Python3 and designed to list and pull Trello boards. It can be used by penetration testers/bug bounty hunters to de

Ali Fathi Ali Sawehli 1 Dec 05, 2021
Companion "receiver" to matrix-appservice-webhooks for [matrix].

Matrix Webhook Receiver Companion "receiver" to matrix-appservice-webhooks for [matrix]. The purpose of this app is to listen for generic webhook mess

Kim Brose 13 Sep 29, 2022
Modified Version of mega.py package for Pyrogram Bots

Pyro Mega.py Python library for the Mega.co.nz API, currently supporting: login uploading downloading deleting searching sharing renaming moving files

I'm Not A Bot #Left_TG 10 Aug 03, 2022
A simple waybar module to display the status of the ICE you are currently in using the ICE Portals JSON API.

waybar-iceportal A simple waybar module to display the status of the ICE you are currently in using the ICE Portals JSON API. Installation Ensure pyth

Moritz 7 Aug 26, 2022