Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

Overview

Full Spectrum Bioinformatics

DOI

NSF-1942647.

Full Spectrum Bioinformatics is a free online text designed to introduce key topics in Bioinformatics using the Python programming language. The text is written in interactive Jupyter Notebooks, which allow you to try out and modify example code and analyses.

In addition to explanations of concepts, Full Spectrum Bioinformatics also includes Bioinformatics Vignettes written by readers of the text. Each vignette is focused around a particular core concept, and show how readers have applied that concepts to their research projects.

If you happen to already be familiar with GitHub and Jupyter Notebooks, you can download the entire project and run it interactively, or click the 'Open in Colab' links to open interactive versions of each section in Google Colab (you will need to 'Save as' your own copy in order to change code). You can also view a static version of each section using the nbviewer links. If using the direct GitHub links, you may sometimes get a GitHub error message. Usually hitting reload page or using the nbviewer link avoids this issue.

licensebuttons by-nc-sa
Lead Author: Jesse Zaneveld1
Vignette Authors: Nia Prabhu*1, Aziz Bajouri*1,2, Ayomikun Akinrinade*1,3

* Vignette authors contributed equally and are listed in chronological order of first contribution.
1 Division of Biological Sciences, School of STEM, University of Washington, Bothell, Washington, USA
2 Division of Computer and Software Systems, School of STEM, University of Washington, Bothell, Washington, USA
3 Division of Health Studies, School of Nursing and Health Studies, University of Washington, Bothell, Washington, USA

The text is currently in prototype status. Chapters with content you can preview are linked below:

This project is being developed with support from NSF Integrative and Organismal Systems award NSF-1942647.

Feedback

You can submit feedback about completed chapters at the following link

Comments
  • Bump nokogiri from 1.10.9 to 1.11.1

    Bump nokogiri from 1.10.9 to 1.11.1

    Bumps nokogiri from 1.10.9 to 1.11.1.

    Release notes

    Sourced from nokogiri's releases.

    v1.11.1 / 2021-01-06

    Fixed

    • [CRuby] If libxml-ruby is loaded before nokogiri, the SAX and Push parsers no longer call libxml-ruby's handlers. Instead, they defensively override the libxml2 global handler before parsing. [#2168]

    SHA-256 Checksums of published gems

    a41091292992cb99be1b53927e1de4abe5912742ded956b0ba3383ce4f29711c  nokogiri-1.11.1-arm64-darwin.gem
    d44fccb8475394eb71f29dfa7bb3ac32ee50795972c4557ffe54122ce486479d  nokogiri-1.11.1-java.gem
    f760285e3db732ee0d6e06370f89407f656d5181a55329271760e82658b4c3fc  nokogiri-1.11.1-x64-mingw32.gem
    dd48343bc4628936d371ba7256c4f74513b6fa642e553ad7401ce0d9b8d26e1f  nokogiri-1.11.1-x86-linux.gem
    7f49138821d714fe2c5d040dda4af24199ae207960bf6aad4a61483f896bb046  nokogiri-1.11.1-x86-mingw32.gem
    5c26111f7f26831508cc5234e273afd93f43fbbfd0dcae5394490038b88d28e7  nokogiri-1.11.1-x86_64-darwin.gem
    c3617c0680af1dd9fda5c0fd7d72a0da68b422c0c0b4cebcd7c45ff5082ea6d2  nokogiri-1.11.1-x86_64-linux.gem
    42c2a54dd3ef03ef2543177bee3b5308313214e99f0d1aa85f984324329e5caa  nokogiri-1.11.1.gem
    

    v1.11.0 / 2021-01-03

    Notes

    Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

    "Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.

    We've been shipping native Windows gems since 2009, but starting in v1.11.0 we are also shipping native gems for these platforms:

    • Linux: x86-linux and x86_64-linux -- including musl platforms like alpine
    • OSX/Darwin: x86_64-darwin and arm64-darwin

    We'd appreciate your thoughts and feedback on this work at #2075.

    Dependencies

    Ruby

    This release introduces support for Ruby 2.7 and 3.0 in the precompiled native gems.

    This release ends support for:

    Gems

    ... (truncated)

    Changelog

    Sourced from nokogiri's changelog.

    v1.11.1 / 2021-01-06

    Fixed

    • [CRuby] If libxml-ruby is loaded before nokogiri, the SAX and Push parsers no longer call libxml-ruby's handlers. Instead, they defensively override the libxml2 global handler before parsing. [#2168]

    v1.11.0 / 2021-01-03

    Notes

    Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

    "Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.

    We've been shipping native Windows gems since 2009, but starting in v1.11.0 we are also shipping native gems for these platforms:

    • Linux: x86-linux and x86_64-linux -- including musl platforms like alpine
    • OSX/Darwin: x86_64-darwin and arm64-darwin

    We'd appreciate your thoughts and feedback on this work at #2075.

    Dependencies

    Ruby

    This release introduces support for Ruby 2.7 and 3.0 in the precompiled native gems.

    This release ends support for:

    Gems

    • Explicitly add racc as a runtime dependency. [#1988] (Thanks, @voxik!)
    • [MRI] Upgrade mini_portile2 dependency from ~> 2.4.0 to ~> 2.5.0 [#2005] (Thanks, @alejandroperea!)

    Security

    See note below about CVE-2020-26247 in the "Changed" subsection entitled "XML::Schema parsing treats input as untrusted by default".

    Added

    • Add Node methods for manipulating "keyword attributes" (for example, class and rel): #kwattr_values, #kwattr_add, #kwattr_append, and #kwattr_remove. [#2000]

    ... (truncated)

    Commits
    • 7be6f04 version bump to v1.11.1
    • aa0c399 dev: overhaul .gitignore
    • 3d90c6d Merge pull request #2169 from sparklemotion/2168-active-support-test-failure
    • bbf850c changelog: update for #2168
    • ee69772 ci: another valgrind suppression
    • f9a2c4e fix: restore proper error handling in the SAX push parser
    • 35aa88b fix(cruby): reset libxml2's error handler in sax and push parsers
    • 07459fd fix(test): clobber libxml2's global error handler before every test
    • b682ac5 ci: ensure all tests are running setup
    • 007662f github: update "installation difficulty" issue template
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 2
  • Missing reading response link on

    Missing reading response link on "Error Messages in Python"

    opened by LucaOnline 1
  • Add discussion of HISAT2 & transcriptomics

    Add discussion of HISAT2 & transcriptomics

    HiSat2 https://anaconda.org/bioconda/hisat2

    Salmon intro (another alternative that interoperates well with DESeq2) https://combine-lab.github.io/salmon/getting_started/

    opened by zaneveld 0
  • Literature Synthesis section -- discuss cutting extra phrases that don't add meaning in literature

    Literature Synthesis section -- discuss cutting extra phrases that don't add meaning in literature

    In addition we found a more recent study that showed that [research finding] (cite1;cite2). --> [research finding]

    In a 2016 study it was shown that [finding])(cite1) --> finding

    opened by zaneveld 0
  • More database links:  https://www.cbioportal.org/  (Cancer research database)  https://www.idigbio.org/  (Integrated digitized biocollections)  https://www.gbif.org/ (biodiversity data)  https://bceenetwork.org/cure-summaries/  https://docs.google.com/document/d/1gC-sj3p8aUKgEDxVPJfq793Mm4n5niZm/edit (overview of databases for genes and genomics for cancer)

    More database links: https://www.cbioportal.org/ (Cancer research database) https://www.idigbio.org/ (Integrated digitized biocollections) https://www.gbif.org/ (biodiversity data) https://bceenetwork.org/cure-summaries/ https://docs.google.com/document/d/1gC-sj3p8aUKgEDxVPJfq793Mm4n5niZm/edit (overview of databases for genes and genomics for cancer)

    Open resources shared in the 2022 AACU Talks (CUREing Cancer: How a Virtual Cancer Genomics CURE Made Research Accessible to Students During COVID and another was on Expanding Access to Undergraduate Research Through BCEENET Cures Using Digitized Collections Data) on CUREs (shared by Robin Angotti):

    https://www.cbioportal.org/ (Cancer research database)
    https://www.idigbio.org/ (Integrated digitized biocollections) https://www.gbif.org/ (biodiversity data) https://bceenetwork.org/cure-summaries/ https://docs.google.com/document/d/1gC-sj3p8aUKgEDxVPJfq793Mm4n5niZm/edit (overview of databases for genes and genomics for cancer)

    opened by zaneveld 0
Releases(release-2022.3.1)
  • release-2022.3.1(Mar 2, 2022)

    What's Changed

    The 2022.3.1 Release of Full Spectrum Bioinformatics greatly expands the scope and maturity of the text, including contributions from 3 undergraduate co-authors. This text has now been used to support multiple classes, and has 35 sections that are linked from the table of content and ready for classroom use.

    Here are some of the major changes:

    The text has several new sections: -- An overview of python syntax now overviews how to recognize python syntax before we dive into studying the details -- A first chapter on sequence alignment now covers Needleman-Wunsch alignment, both as worked by hand using a simple example, and an implementation in numpy. -- The text now discusses linear models, with accompanying illustrations as well as figures -- An Error Bingo exercise now encourages students to intentionally trigger and learn from errors
    -- An extensive section has been added discussing common errors in python, why they most commonly occur, and how to fix them.

    -- 3 undergraduate contributors have added Bioinformatics Vignettes showing how to apply the principles in the text to biological problems: - Nia Prabhu (nucleotide composition) - Aziz Bajouri (set analysis) - Ayomikun Akinrinade (machine learning)

    -- A section has been added on revising writing about statistical results -- An initial draft section on visualizing correlation has been added showing how a scatterplot can be revised to add linear regression results, 95% confidence intervals, and to better meet recommendations for data visualization. -- The Data Sources page has been greatly updated, and now includes logos for linked resources

    New Draft Sections: -- A draft section on student activism and fighting for an inclusive workplace has been added. -- A draft section on network analysis has several in-progress code commits (not yet linked from main table of contents)

    Other changes: -- Full Spectrum Bioinformatics has now adopted a code of conduct -- Many minor fixes -- Exercises have been added to many sections that previously lacked them -- The exercise on calculating CG content in the human genome has been updated -- Several chapters have been updated to include Feedback links that were previously missing -- Unused Jupyter Book files have been removed

    Full Changelog: https://github.com/zaneveld/full_spectrum_bioinformatics/compare/release-2020.12.1...release-2022.3.1

    Source code(tar.gz)
    Source code(zip)
    full_spectrum_bioinformatics_2022.3.0.zip(182.17 MB)
  • release-2020.12.1(Dec 8, 2020)

    This is an initial development release of the Full Spectrum Bioinformatics online textbook. This is not a full release of the entire planned textbook, but rather an incremental development release of some content that is sufficiently developed that it has been used in classes.

    Some current features include: -- A series of open-access Jupyter Notebooks discussing topics in Bioinformatics. -- Links to Google Colab to allow students to run notebooks in a browser without installing software -- An outline table of contents shows planned sections, with sections that are in beta status available as live links. -- This release includes 21 new sections, covering topics ranging from sequence analysis to how to revise one's writing about statistical results:

    Foreword The Command Line Using the Command Line Exercise: Little Brother is Missing Exploring Python Exploring Python A Tour of Python Data Types Project Design Using Literature Surveys to Ask Good Questions and Propose Testable Hypotheses Biological Sequences An introduction to Biological Sequences Representing and Manipulating Biological Sequences as Python Strings Analyzing Biological Sequences with For Loops and If Statements Reading and writing FASTA files using Python 'Omics An Introduction to 'Omics Working with Tabular 'Omic data in Python using Pandas Phylogenetic Trees Representing Phylogenetic Trees with Python Classes Generating Trees Using Birth-Death Models Simulation Simulating the Population Genetics of Natural Selection and Genetic Drift Statistics Rank Transformations Monte Carlo simulation of Effect Size, Sample Size, and Significance Dealing with Multiple Comparisons Exercise: Revising your writing about statistical results Polishing and Publishing Presenting Research Careers that draw on Bioinformatics Applying for Grants

    NOTE: this is very similar to release-2020.12.0, other than minor edits to the readme but I need to re-release to trigger Zenodo to generate a DOI.

    Source code(tar.gz)
    Source code(zip)
  • release-2020.12.0(Dec 7, 2020)

    This is an initial development release of the Full Spectrum Bioinformatics online textbook. This is not a full release of the entire planned textbook, but rather an incremental development release of some content that is sufficiently developed that it has been used in classes.

    Some current features include: -- A series of open-access Jupyter Notebooks discussing topics in Bioinformatics. -- Links to Google Colab to allow students to run notebooks in a browser without installing software -- An outline table of contents shows planned sections, with sections that are in beta status available as live links. -- This release includes 21 new sections, covering topics ranging from sequence analysis to how to revise one's writing about statistical results:

    Foreword The Command Line Using the Command Line Exercise: Little Brother is Missing Exploring Python Exploring Python A Tour of Python Data Types Project Design Using Literature Surveys to Ask Good Questions and Propose Testable Hypotheses Biological Sequences An introduction to Biological Sequences Representing and Manipulating Biological Sequences as Python Strings Analyzing Biological Sequences with For Loops and If Statements Reading and writing FASTA files using Python 'Omics An Introduction to 'Omics Working with Tabular 'Omic data in Python using Pandas Phylogenetic Trees Representing Phylogenetic Trees with Python Classes Generating Trees Using Birth-Death Models Simulation Simulating the Population Genetics of Natural Selection and Genetic Drift Statistics Rank Transformations Monte Carlo simulation of Effect Size, Sample Size, and Significance Dealing with Multiple Comparisons Exercise: Revising your writing about statistical results Polishing and Publishing Presenting Research Careers that draw on Bioinformatics Applying for Grants

    Source code(tar.gz)
    Source code(zip)
    full_spectrum_bioinformatics.zip(84.89 MB)
Owner
Jesse Zaneveld
Jesse Zaneveld
ASCEND Chinese-English code-switching dataset

ASCEND (A Spontaneous Chinese-English Dataset) introduces a high-quality resource of spontaneous multi-turn conversational dialogue Chinese-English code-switching corpus collected in Hong Kong.

CAiRE 11 Dec 09, 2022
Course project of [email protected]

NaiveMT Prepare Clone this repository git clone [email protected]:Poeroz/NaiveMT.git

Poeroz 2 Apr 24, 2022
The guide to tackle with the Text Summarization

The guide to tackle with the Text Summarization

Takahiro Kubo 1.2k Dec 30, 2022
Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)

Linear Multihead Attention (Linformer) PyTorch Implementation of reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer:

Kui Xu 58 Dec 23, 2022
Th2En & Th2Zh: The large-scale datasets for Thai text cross-lingual summarization

Th2En & Th2Zh: The large-scale datasets for Thai text cross-lingual summarization 📥 Download Datasets 📥 Download Trained Models INTRODUCTION TH2ZH (

Nakhun Chumpolsathien 5 Jan 03, 2022
Chatbot with Pytorch, Python & Nextjs

Installation Instructions Make sure that you have Python 3, gcc, venv, and pip installed. Clone the repository $ git clone https://github.com/sahr

Rohit Sah 0 Dec 11, 2022
Multilingual text (NLP) processing toolkit

polyglot Polyglot is a natural language pipeline that supports massive multilingual applications. Free software: GPLv3 license Documentation: http://p

RAMI ALRFOU 2.1k Jan 07, 2023
Official implementation of Meta-StyleSpeech and StyleSpeech

Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dongchan Min, Dong Bok Lee, Eunho Yang, and Sung Ju Hwang This is an official code

min95 169 Jan 05, 2023
Code for hyperboloid embeddings for knowledge graph entities

Implementation for the papers: Self-Supervised Hyperboloid Representations from Logical Queries over Knowledge Graphs, Nurendra Choudhary, Nikhil Rao,

30 Dec 10, 2022
Application to help find best train itinerary, uses speech to text, has a spam filter to segregate invalid inputs, NLP and Pathfinding algos.

T-IAI-901-MSC2022 - GROUP 18 Gestion de projet Notre travail a été organisé et réparti dans un Trello. https://trello.com/b/X3s2fpPJ/ia-projet Install

1 Feb 05, 2022
用Resnet101+GPT搭建一个玩王者荣耀的AI

基于pytorch框架用resnet101加GPT搭建AI玩王者荣耀 本源码模型主要用了SamLynnEvans Transformer 的源码的解码部分。以及pytorch自带的预训练模型"resnet101-5d3b4d8f.pth"

冯泉荔 2.2k Jan 03, 2023
Random Directed Acyclic Graph Generator

DAG_Generator Random Directed Acyclic Graph Generator verison1.0 简介 工作流通常由DAG(有向无环图)来定义,其中每个计算任务$T_i$由一个顶点(node,task,vertex)表示。同时,任务之间的每个数据或控制依赖性由一条加权

Livion 17 Dec 27, 2022
Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

Lime Comparing deep contextualized model for sentences highlighting task. In addition, take the classic explanation model "LIME" with bert-base model

JHJu 2 Jan 18, 2022
Common Voice Dataset explorer

Common Voice Dataset Explorer Common Voice Dataset is by Mozilla Made during huggingface finetuning week Usage pip install -r requirements.txt streaml

Ceyda Cinarel 22 Nov 16, 2022
Syntax-aware Multi-spans Generation for Reading Comprehension (TASLP 2022)

SyntaxGen Syntax-aware Multi-spans Generation for Reading Comprehension (TASLP 2022) In this repo, we upload all the scripts for this work. Due to siz

Zhuosheng Zhang 3 Jun 13, 2022
Prithivida 690 Jan 04, 2023
Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Chenyang Huang 37 Jan 04, 2023
pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

297 Dec 29, 2022
This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

Proteno This is the data release associated with the corresponding NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deploymen

37 Dec 04, 2022
Biterm Topic Model (BTM): modeling topics in short texts

Biterm Topic Model Bitermplus implements Biterm topic model for short texts introduced by Xiaohui Yan, Jiafeng Guo, Yanyan Lan, and Xueqi Cheng. Actua

Maksim Terpilowski 49 Dec 30, 2022