Find Transposon Element insertions using long reads (nanopore), by alignment directly. (minimap2)

Overview

find_te_ins

find_te_ins is designed to find Transposon Element (TE) insertions using long reads (nanopore), by alignment directly. (minimap2)

Install

$ git clone https://github.com/bakerwm/find_te_ins.git
$ cd find_te_ins

Change the following variables upon your condition: genome_fa and te_fa in line-10 and line-11;

$ bash run_pipe.sh
run_pipe.sh 
    
    

    
   

Prerequisite

  • minimap2 - 2.17-r974-dirty, align long reads to reference genome
  • featureCounts - v2.0.0, quantification
  • samtools - v1.12, working with BAM files
  • python 3.8+
  • pysam 0.16.0.1, python module, working with BAM files

Getting Started

1 Prepare input files

  • genome_fa - reference genome in fasta format, in script run_pipe.sh, line-10
  • te_fa - TE consensus sequence in fasta format, in script run_pipe.sh, line-11
  • long reads - Long reads from NanoPore or Pacbio, in fasta or fastq format

2 Run pipe

$ cd ~/work/te_ins
# specify the path of long reads data: 
   
    /
   
$ git clone https://github.com/bakerwm/find_te_ins.git 
$ bash find_te_ins/run_pipe.sh <path-to-long-reads>/ results

[1/9] align to reference genome
[2/9] extract raw insertions from BAM, by CIGAR
[3/9] convert raw insertions to fasta format
[4/9] align raw_insertion to transposon
[5/9] extract transposon name for insertions
[6/9] merge raw_insertions by window=100
[7/9] count reads for each insertion
[8/9] save final insertions to file
[9/9] Done!

3 Output

The following files listed below are the output of the pipeline, the TE insertions saved in file *.te_ins.final.bed

$ tree -L 2 results/ONT_sample-1
.
├── ONT_sample-1
│   ├── ONT_sample-1.bam
│   ├── ONT_sample-1.bam.bai
│   ├── ONT_sample-1.raw_ins.bed
│   ├── ONT_sample-1.raw_ins.fa
│   ├── ONT_sample-1.raw_ins.fa.bam
│   ├── ONT_sample-1.raw_ins.fa.bam.bai
│   ├── ONT_sample-1.te_ins.bed
│   ├── ONT_sample-1.te_ins.final.bed
│   ├── ONT_sample-1.te_ins.final.bed6
│   ├── ONT_sample-1.te_ins.gtf
│   ├── ONT_sample-1.te_ins.quant.stderr
│   ├── ONT_sample-1.te_ins.quant.stdout
│   ├── ONT_sample-1.te_ins.quant.txt
│   ├── ONT_sample-1.te_ins.quant.txt.summary
│   ├── ONT_sample-1.te_ins.raw.txt
│   ├── run_minimap2.dm6.stderr
│   └── run_minimap2.dm6_transposon.stderr
...

{sample_name}.te_ins.final.bed

column 1. chr name of reference 
column 2. start pos of Insertion 
column 3. end pos of Insertion 
column 4. insertion name 
column 5. a fixed integer [255]  
column 6. strand # in current version, not consider the dirction of TE insertions !!!
column 7. name of TE consensus 
column 8. length of TE consensus  
column 9. proportion of the TE consensus identified  
column 10. number of supported reads for the insertion 
column 11. number of all reads cover the insertion 
column 12. proportion TE supported reads 
column 13. type of the TE insertions [full, p3, p5]

{sample_name}.te_ins.raw.txt

column 16 (last column), is the type of TE insertions: [full, p3, p5]

  • full, more then cutoff [60%] of the TE consensus were detected
  • p3, only the 3' end of the TE consensus were detected
  • p5, only the 5' end of the TE consensus were detected

In the .final.bed file, ONLY full TE insertions were saved for further analysis

Change criteria

TE types were defined in run_pipe.sh by anno_te.py, the criteria -c 0.6 could be changed to [0-1] float number based on your condition. see line-100 in file run_pipe.sh

# line-100 of run_pipe.sh
[[ ! -f ${te_ins_txt} ]] && python ${src_dir}/anno_te.py -x ${te_fa_fai} ${te_bam} | sort -k4,4 -k5,5n > ${te_ins_txt}

# change criteria to 0.7
[[ ! -f ${te_ins_txt} ]] && python ${src_dir}/anno_te.py -x ${te_fa_fai} -c 0.7 ${te_bam} | sort -k4,4 -k5,5n > ${te_ins_txt}

# remove te_ins files, and run the command again
$ rm results/ONT_sample-1.te_ins*
$ bash find_te_ins/run_pipe.sh 
   
    / results

   

How it works?

  1. extract INSERTIONS
Owner
Ming Wang
Ming Wang
A simple PID tuner and simulator.

PIDtuner-V0.1 PlantPy PID tuner version 0.1 Features Supports first order and ramp process models. Supports Proportional action on PV or error or a sp

3 Jun 23, 2022
A dashboard for your code. A build system.

NOTICE: THIS REPO IS NO LONGER UPDATED Changes Changes is a build coordinator and reporting solution written in Python. The project is primarily built

Dropbox 763 Sep 09, 2022
🤞 Website-Survival-Detection

- 🤞 Website-Survival-Detection It can help you to detect the survival status of the website in batches and return the status code! - 📜 Instructions

B0kd1 4 Nov 14, 2022
Desenvolvendo as habilidades básicas de programação visando a construção de aplicativos por meio de bibliotecas apropriadas à Ciência de Dados.

Algoritmos e Introdução à Computação Ementa: Conceitos básicos sobre algoritmos e métodos para sua construção. Tipos de dados e variáveis. Estruturas

Dyanna Cruz 1 Jan 06, 2022
《practical python programming》的中文翻译

欢迎光临 大约 25 年前,当我第一次学习 Python 时,发现 Python 竟然可以被高效地应用到各种混乱的工作项目上,我立即被震惊了。15 年前,我自己也将这种乐趣教授给别人。教学的结果就是本课程——一门实用的学习 Python的课程。

编程人 125 Dec 17, 2022
The code for 2021 MGTV AI Challenge Anti Stealing Link, and the online result ranks 10th.

赛题介绍 芒果TV-第二届“马栏山杯”国际音视频算法大赛-防盗链 随着业务的发展,芒果的视频内容也深受网友的喜欢,不少视频网站和应用开始盗播芒果的视频内容,盗链网站不经过芒果TV的前端系统,跳过广告播放,且消耗大量的服务器、带宽资源,直接给公司带来了巨大的经济损失,因此防盗链在日常运营中显得尤为重要

tongji40 16 Jun 17, 2022
Context-free grammar to Sublime-syntax file

Generate a sublime-syntax file from a non-left-recursive, follow-determined, context-free grammar

Haggai Nuchi 8 Nov 17, 2022
A python library what works with numbers.

pynum A python library what works with numbers. Prime Prime class have everithing you want about prime numbers. check_prime The check_prime method is

Mohammad Mahdi Paydar Puya 1 Jan 07, 2022
A lightweight Python module to interact with the Mitre Att&ck Enterprise dataset.

enterpriseattack - Mitre's Enterprise Att&ck A lightweight Python module to interact with the Mitre Att&ck Enterprise dataset. Built to be used in pro

xakepnz 7 Jan 01, 2023
Runtime profiler for Streamlit, powered by pyinstrument

streamlit-profiler 🏄🏼 Runtime profiler for Streamlit, powered by pyinstrument. streamlit-profiler is a Streamlit component that helps you find out w

Johannes Rieke 23 Nov 30, 2022
Mengzhan (John) code for Closed Loop Control system of Sharp Wave Ripples in Hippocampus CA3 region

ClosedLoopControl_Yu Mengzhan (John) code for Closed Loop Control system of Sharp Wave Ripples in Hippocampus CA3 region Creating Python Virtual Envir

Mengzhan (John) Liufu 1 Jan 22, 2022
Think DSP: Digital Signal Processing in Python, by Allen B. Downey.

ThinkDSP LaTeX source and Python code for Think DSP: Digital Signal Processing in Python, by Allen B. Downey. The premise of this book (and the other

Allen Downey 3.2k Jan 08, 2023
JurjenLang, an interpreted programming language

JurjenLang An interpreted programming language Getting started Follow these three steps on your computer to get started git clone https://github.com/J

JVerbruggen 5 May 03, 2022
Wordle-solve - Attempting to solve wordle

Wordle Solver Run with python wordle_beater.py. This hardmode wordle solver take

Tom Lockwood 42 Oct 11, 2022
mypy plugin for PynamoDB

pynamodb-mypy A plugin for mypy which gives it deeper understanding of PynamoDB (beyond what's possible through type stubs). Usage Add it to the plugi

1 Oct 21, 2022
Calculadora-basica - Calculator with basic operators

Calculadora básica Calculadora com operadores básicos; O programa solicitará a d

Vitor Antoni 2 Apr 26, 2022
The most hackable keyboard in all the land

MiRage Modular Keyboard © 2021 Zack Freedman of Voidstar Lab Licensed Creative Commons 4.0 Attribution Noncommercial Share-Alike The MiRage is a 60% o

Zack Freedman 558 Dec 30, 2022
My collection of mini-projects in various languages

Mini-Projects My collection of mini-projects in various languages About: This repository consists of a number of small projects. Most of these "mini-p

Siddhant Attavar 1 Jul 11, 2022
A chain of stores wants a 3-month demand forecast for its 10 different stores and 50 different products.

Demand Forecasting Objective A chain store wants a machine learning project for a 3-month demand forecast for 10 different stores and 50 different pro

2 Jan 06, 2022