Find Transposon Element insertions using long reads (nanopore), by alignment directly. (minimap2)

Overview

find_te_ins

find_te_ins is designed to find Transposon Element (TE) insertions using long reads (nanopore), by alignment directly. (minimap2)

Install

$ git clone https://github.com/bakerwm/find_te_ins.git
$ cd find_te_ins

Change the following variables upon your condition: genome_fa and te_fa in line-10 and line-11;

$ bash run_pipe.sh
run_pipe.sh 
    
    

    
   

Prerequisite

  • minimap2 - 2.17-r974-dirty, align long reads to reference genome
  • featureCounts - v2.0.0, quantification
  • samtools - v1.12, working with BAM files
  • python 3.8+
  • pysam 0.16.0.1, python module, working with BAM files

Getting Started

1 Prepare input files

  • genome_fa - reference genome in fasta format, in script run_pipe.sh, line-10
  • te_fa - TE consensus sequence in fasta format, in script run_pipe.sh, line-11
  • long reads - Long reads from NanoPore or Pacbio, in fasta or fastq format

2 Run pipe

$ cd ~/work/te_ins
# specify the path of long reads data: 
   
    /
   
$ git clone https://github.com/bakerwm/find_te_ins.git 
$ bash find_te_ins/run_pipe.sh <path-to-long-reads>/ results

[1/9] align to reference genome
[2/9] extract raw insertions from BAM, by CIGAR
[3/9] convert raw insertions to fasta format
[4/9] align raw_insertion to transposon
[5/9] extract transposon name for insertions
[6/9] merge raw_insertions by window=100
[7/9] count reads for each insertion
[8/9] save final insertions to file
[9/9] Done!

3 Output

The following files listed below are the output of the pipeline, the TE insertions saved in file *.te_ins.final.bed

$ tree -L 2 results/ONT_sample-1
.
├── ONT_sample-1
│   ├── ONT_sample-1.bam
│   ├── ONT_sample-1.bam.bai
│   ├── ONT_sample-1.raw_ins.bed
│   ├── ONT_sample-1.raw_ins.fa
│   ├── ONT_sample-1.raw_ins.fa.bam
│   ├── ONT_sample-1.raw_ins.fa.bam.bai
│   ├── ONT_sample-1.te_ins.bed
│   ├── ONT_sample-1.te_ins.final.bed
│   ├── ONT_sample-1.te_ins.final.bed6
│   ├── ONT_sample-1.te_ins.gtf
│   ├── ONT_sample-1.te_ins.quant.stderr
│   ├── ONT_sample-1.te_ins.quant.stdout
│   ├── ONT_sample-1.te_ins.quant.txt
│   ├── ONT_sample-1.te_ins.quant.txt.summary
│   ├── ONT_sample-1.te_ins.raw.txt
│   ├── run_minimap2.dm6.stderr
│   └── run_minimap2.dm6_transposon.stderr
...

{sample_name}.te_ins.final.bed

column 1. chr name of reference 
column 2. start pos of Insertion 
column 3. end pos of Insertion 
column 4. insertion name 
column 5. a fixed integer [255]  
column 6. strand # in current version, not consider the dirction of TE insertions !!!
column 7. name of TE consensus 
column 8. length of TE consensus  
column 9. proportion of the TE consensus identified  
column 10. number of supported reads for the insertion 
column 11. number of all reads cover the insertion 
column 12. proportion TE supported reads 
column 13. type of the TE insertions [full, p3, p5]

{sample_name}.te_ins.raw.txt

column 16 (last column), is the type of TE insertions: [full, p3, p5]

  • full, more then cutoff [60%] of the TE consensus were detected
  • p3, only the 3' end of the TE consensus were detected
  • p5, only the 5' end of the TE consensus were detected

In the .final.bed file, ONLY full TE insertions were saved for further analysis

Change criteria

TE types were defined in run_pipe.sh by anno_te.py, the criteria -c 0.6 could be changed to [0-1] float number based on your condition. see line-100 in file run_pipe.sh

# line-100 of run_pipe.sh
[[ ! -f ${te_ins_txt} ]] && python ${src_dir}/anno_te.py -x ${te_fa_fai} ${te_bam} | sort -k4,4 -k5,5n > ${te_ins_txt}

# change criteria to 0.7
[[ ! -f ${te_ins_txt} ]] && python ${src_dir}/anno_te.py -x ${te_fa_fai} -c 0.7 ${te_bam} | sort -k4,4 -k5,5n > ${te_ins_txt}

# remove te_ins files, and run the command again
$ rm results/ONT_sample-1.te_ins*
$ bash find_te_ins/run_pipe.sh 
   
    / results

   

How it works?

  1. extract INSERTIONS
Owner
Ming Wang
Ming Wang
An app to help people apply for admissions on schools/hostels

Admission-helper About An app to help people apply for admissions on schools/hostels This app is a rewrite of Admission-helper-beta-v5.8.9 and I impor

Advik 3 Apr 24, 2022
Providing a working, flexible, easier and faster installer than the one officially provided by Arch Linux

Purpose The purpose is to bring more people to Arch Linux by providing a working, flexible, easier and faster installer than the one officially provid

André Luís 0 Nov 09, 2022
Dockernized ZeroTierOne controller with zero-ui web interface.

docker-zerotier-controller Dockernized ZeroTierOne controller with zero-ui web interface. 中文讨论 Customize ZeroTierOne's controller planets Modify patch

sbilly 209 Jan 04, 2023
Custom Python code for calculating the Probability of Profit (POP) for options trading strategies using Monte Carlo Simulations.

Custom Python code for calculating the Probability of Profit (POP) for options trading strategies using Monte Carlo Simulations.

35 Nov 22, 2022
Up to date simple useragent faker with real world database

fake-useragent info: Up to date simple useragent faker with real world database Features grabs up to date useragent from useragentstring.com randomize

Victor K. 2.9k Jan 04, 2023
dragmap-meth: Fast and accurate aligner for bisulfite sequencing reads using dragmap

dragmap_meth (dragmap_meth.py) Alignment of BS-Seq reads using dragmap. Intro This works for single-end reads and for paired-end reads from the direct

Shaojun Xie 3 Jul 14, 2022
Programa que organiza pastas automaticamente

📂 Folder Organizer 📂 Programa que organiza pastas automaticamente Requisitos • Como usar • Melhorias futuras • Capturas de Tela Requisitos Antes de

João Victor Vilela dos Santos 1 Nov 02, 2021
Prometheus exporter for Spigot accounts

SpigotExporter Prometheus exporter for Spigot accounts What it provides SpigotExporter will output metrics for each of your plugins and a cumulative d

Jacob Bashista 5 Dec 20, 2021
Datasets with Softcatalà website content

softcatala-web-dataset This repository contains Sofcatalà web site content (articles and programs descriptions). Dataset are available in the dataset

Softcatalà 2 Dec 26, 2021
Python for downloading model data (HRRR, RAP, GFS, NBM, etc.) from NOMADS, NOAA's Big Data Program partners (Amazon, Google, Microsoft), and the University of Utah Pando Archive System.

Python for downloading model data (HRRR, RAP, GFS, NBM, etc.) from NOMADS, NOAA's Big Data Program partners (Amazon, Google, Microsoft), and the University of Utah Pando Archive System.

Brian Blaylock 194 Jan 02, 2023
Creates infinite amount of guilded accounts in seconds.

Guilded Cookie Creator [fuck guilded i quit working on this, they patch like every fucking method after 2/3 days i release shit] Optimizations Asynchr

scripted 7 Feb 28, 2022
It really seems like Trump is trying to get his own social media started. Not a huge fan tbh.

FuckTruthSocial It really seems like Trump is trying to get his own social media started. Not a huge fan tbh. (When TruthSocial actually releases, I'l

0 Jul 18, 2022
A website to collect vintage 4 tracks cassette recorders.

Vintage 4tk cassette recorders A website to collect vintage 4 tracks cassette recorders. Local development setup Copy and customize Django settings (e

1 May 01, 2022
Minutaria is a basic educational Python timer used to learn python and software testing libraries.

minutaria minutaria is a basic educational Python timer. The project is educational, it aims to teach myself programming, python programming, python's

1 Jul 16, 2021
Mangá downloader (para leitura offline) voltado para sites e scans brasileiros.

yonde! yonde! (読んで!) é um mangá downloader (para leitura offline) voltado para sites e scans brasileiros. Também permite que você converta os capítulo

Yonde 8 Nov 28, 2021
Simple web application, which has a single endpoint, dedicated to annotation parsing and convertion.

Simple web application, which has a single endpoint, dedicated to annotation parsing and conversion.

Pavel Paranin 1 Nov 01, 2021
Reference management solution using Python and Notion.

notion-scholar Reference management solution using Python and Notion. The main idea of this app is to allow to furnish a Notion database using a BibTe

Thomas Hirtz 69 Dec 21, 2022
Neogex is a human readable parser standard, being implemented in Python

Neogex (New Expressions) Parsing Standard Much like Regex, Neogex allows for string parsing and validation based on a set of requirements. Unlike Rege

Seamus Donnellan 1 Dec 17, 2021
A Desktop application for the signalum python library

Signalum Desktop A Desktop application on the Signalum Python Library/CLI Tool. The Signalum Desktop application is an attempt to develop a single too

BISOHNS 35 Feb 15, 2021
Today I Commit (1일 1커밋) 챌린지 알림 봇

Today I Commit Challenge 1일1커밋 챌린지를 위한 알림 봇 config.py github_token = "github private access key" slack_token = "slack authorization token" channel = "

sunho 4 Nov 08, 2021