Full Spectrum Bioinformatics - a free online text designed to introduce key topics in Bioinformatics using the Python

Overview

Full Spectrum Bioinformatics

DOI

NSF-1942647.

Full Spectrum Bioinformatics is a free online text designed to introduce key topics in Bioinformatics using the Python programming language. The text is written in interactive Jupyter Notebooks, which allow you to try out and modify example code and analyses.

In addition to explanations of concepts, Full Spectrum Bioinformatics also includes Bioinformatics Vignettes written by readers of the text. Each vignette is focused around a particular core concept, and show how readers have applied that concepts to their research projects.

If you happen to already be familiar with GitHub and Jupyter Notebooks, you can download the entire project and run it interactively, or click the 'Open in Colab' links to open interactive versions of each section in Google Colab (you will need to 'Save as' your own copy in order to change code). You can also view a static version of each section using the nbviewer links. If using the direct GitHub links, you may sometimes get a GitHub error message. Usually hitting reload page or using the nbviewer link avoids this issue.

licensebuttons by-nc-sa
Lead Author: Jesse Zaneveld1
Vignette Authors: Nia Prabhu*1, Aziz Bajouri*1,2, Ayomikun Akinrinade*1,3

* Vignette authors contributed equally and are listed in chronological order of first contribution.
1 Division of Biological Sciences, School of STEM, University of Washington, Bothell, Washington, USA
2 Division of Computer and Software Systems, School of STEM, University of Washington, Bothell, Washington, USA
3 Division of Health Studies, School of Nursing and Health Studies, University of Washington, Bothell, Washington, USA

The text is currently in prototype status. Chapters with content you can preview are linked below:

This project is being developed with support from NSF Integrative and Organismal Systems award NSF-1942647.

Feedback

You can submit feedback about completed chapters at the following link

Comments
  • Bump nokogiri from 1.10.9 to 1.11.1

    Bump nokogiri from 1.10.9 to 1.11.1

    Bumps nokogiri from 1.10.9 to 1.11.1.

    Release notes

    Sourced from nokogiri's releases.

    v1.11.1 / 2021-01-06

    Fixed

    • [CRuby] If libxml-ruby is loaded before nokogiri, the SAX and Push parsers no longer call libxml-ruby's handlers. Instead, they defensively override the libxml2 global handler before parsing. [#2168]

    SHA-256 Checksums of published gems

    a41091292992cb99be1b53927e1de4abe5912742ded956b0ba3383ce4f29711c  nokogiri-1.11.1-arm64-darwin.gem
    d44fccb8475394eb71f29dfa7bb3ac32ee50795972c4557ffe54122ce486479d  nokogiri-1.11.1-java.gem
    f760285e3db732ee0d6e06370f89407f656d5181a55329271760e82658b4c3fc  nokogiri-1.11.1-x64-mingw32.gem
    dd48343bc4628936d371ba7256c4f74513b6fa642e553ad7401ce0d9b8d26e1f  nokogiri-1.11.1-x86-linux.gem
    7f49138821d714fe2c5d040dda4af24199ae207960bf6aad4a61483f896bb046  nokogiri-1.11.1-x86-mingw32.gem
    5c26111f7f26831508cc5234e273afd93f43fbbfd0dcae5394490038b88d28e7  nokogiri-1.11.1-x86_64-darwin.gem
    c3617c0680af1dd9fda5c0fd7d72a0da68b422c0c0b4cebcd7c45ff5082ea6d2  nokogiri-1.11.1-x86_64-linux.gem
    42c2a54dd3ef03ef2543177bee3b5308313214e99f0d1aa85f984324329e5caa  nokogiri-1.11.1.gem
    

    v1.11.0 / 2021-01-03

    Notes

    Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

    "Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.

    We've been shipping native Windows gems since 2009, but starting in v1.11.0 we are also shipping native gems for these platforms:

    • Linux: x86-linux and x86_64-linux -- including musl platforms like alpine
    • OSX/Darwin: x86_64-darwin and arm64-darwin

    We'd appreciate your thoughts and feedback on this work at #2075.

    Dependencies

    Ruby

    This release introduces support for Ruby 2.7 and 3.0 in the precompiled native gems.

    This release ends support for:

    Gems

    ... (truncated)

    Changelog

    Sourced from nokogiri's changelog.

    v1.11.1 / 2021-01-06

    Fixed

    • [CRuby] If libxml-ruby is loaded before nokogiri, the SAX and Push parsers no longer call libxml-ruby's handlers. Instead, they defensively override the libxml2 global handler before parsing. [#2168]

    v1.11.0 / 2021-01-03

    Notes

    Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

    "Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.

    We've been shipping native Windows gems since 2009, but starting in v1.11.0 we are also shipping native gems for these platforms:

    • Linux: x86-linux and x86_64-linux -- including musl platforms like alpine
    • OSX/Darwin: x86_64-darwin and arm64-darwin

    We'd appreciate your thoughts and feedback on this work at #2075.

    Dependencies

    Ruby

    This release introduces support for Ruby 2.7 and 3.0 in the precompiled native gems.

    This release ends support for:

    Gems

    • Explicitly add racc as a runtime dependency. [#1988] (Thanks, @voxik!)
    • [MRI] Upgrade mini_portile2 dependency from ~> 2.4.0 to ~> 2.5.0 [#2005] (Thanks, @alejandroperea!)

    Security

    See note below about CVE-2020-26247 in the "Changed" subsection entitled "XML::Schema parsing treats input as untrusted by default".

    Added

    • Add Node methods for manipulating "keyword attributes" (for example, class and rel): #kwattr_values, #kwattr_add, #kwattr_append, and #kwattr_remove. [#2000]

    ... (truncated)

    Commits
    • 7be6f04 version bump to v1.11.1
    • aa0c399 dev: overhaul .gitignore
    • 3d90c6d Merge pull request #2169 from sparklemotion/2168-active-support-test-failure
    • bbf850c changelog: update for #2168
    • ee69772 ci: another valgrind suppression
    • f9a2c4e fix: restore proper error handling in the SAX push parser
    • 35aa88b fix(cruby): reset libxml2's error handler in sax and push parsers
    • 07459fd fix(test): clobber libxml2's global error handler before every test
    • b682ac5 ci: ensure all tests are running setup
    • 007662f github: update "installation difficulty" issue template
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 2
  • Missing reading response link on

    Missing reading response link on "Error Messages in Python"

    opened by LucaOnline 1
  • Add discussion of HISAT2 & transcriptomics

    Add discussion of HISAT2 & transcriptomics

    HiSat2 https://anaconda.org/bioconda/hisat2

    Salmon intro (another alternative that interoperates well with DESeq2) https://combine-lab.github.io/salmon/getting_started/

    opened by zaneveld 0
  • Literature Synthesis section -- discuss cutting extra phrases that don't add meaning in literature

    Literature Synthesis section -- discuss cutting extra phrases that don't add meaning in literature

    In addition we found a more recent study that showed that [research finding] (cite1;cite2). --> [research finding]

    In a 2016 study it was shown that [finding])(cite1) --> finding

    opened by zaneveld 0
  • More database links:  https://www.cbioportal.org/  (Cancer research database)  https://www.idigbio.org/  (Integrated digitized biocollections)  https://www.gbif.org/ (biodiversity data)  https://bceenetwork.org/cure-summaries/  https://docs.google.com/document/d/1gC-sj3p8aUKgEDxVPJfq793Mm4n5niZm/edit (overview of databases for genes and genomics for cancer)

    More database links: https://www.cbioportal.org/ (Cancer research database) https://www.idigbio.org/ (Integrated digitized biocollections) https://www.gbif.org/ (biodiversity data) https://bceenetwork.org/cure-summaries/ https://docs.google.com/document/d/1gC-sj3p8aUKgEDxVPJfq793Mm4n5niZm/edit (overview of databases for genes and genomics for cancer)

    Open resources shared in the 2022 AACU Talks (CUREing Cancer: How a Virtual Cancer Genomics CURE Made Research Accessible to Students During COVID and another was on Expanding Access to Undergraduate Research Through BCEENET Cures Using Digitized Collections Data) on CUREs (shared by Robin Angotti):

    https://www.cbioportal.org/ (Cancer research database)
    https://www.idigbio.org/ (Integrated digitized biocollections) https://www.gbif.org/ (biodiversity data) https://bceenetwork.org/cure-summaries/ https://docs.google.com/document/d/1gC-sj3p8aUKgEDxVPJfq793Mm4n5niZm/edit (overview of databases for genes and genomics for cancer)

    opened by zaneveld 0
Releases(release-2022.3.1)
  • release-2022.3.1(Mar 2, 2022)

    What's Changed

    The 2022.3.1 Release of Full Spectrum Bioinformatics greatly expands the scope and maturity of the text, including contributions from 3 undergraduate co-authors. This text has now been used to support multiple classes, and has 35 sections that are linked from the table of content and ready for classroom use.

    Here are some of the major changes:

    The text has several new sections: -- An overview of python syntax now overviews how to recognize python syntax before we dive into studying the details -- A first chapter on sequence alignment now covers Needleman-Wunsch alignment, both as worked by hand using a simple example, and an implementation in numpy. -- The text now discusses linear models, with accompanying illustrations as well as figures -- An Error Bingo exercise now encourages students to intentionally trigger and learn from errors
    -- An extensive section has been added discussing common errors in python, why they most commonly occur, and how to fix them.

    -- 3 undergraduate contributors have added Bioinformatics Vignettes showing how to apply the principles in the text to biological problems: - Nia Prabhu (nucleotide composition) - Aziz Bajouri (set analysis) - Ayomikun Akinrinade (machine learning)

    -- A section has been added on revising writing about statistical results -- An initial draft section on visualizing correlation has been added showing how a scatterplot can be revised to add linear regression results, 95% confidence intervals, and to better meet recommendations for data visualization. -- The Data Sources page has been greatly updated, and now includes logos for linked resources

    New Draft Sections: -- A draft section on student activism and fighting for an inclusive workplace has been added. -- A draft section on network analysis has several in-progress code commits (not yet linked from main table of contents)

    Other changes: -- Full Spectrum Bioinformatics has now adopted a code of conduct -- Many minor fixes -- Exercises have been added to many sections that previously lacked them -- The exercise on calculating CG content in the human genome has been updated -- Several chapters have been updated to include Feedback links that were previously missing -- Unused Jupyter Book files have been removed

    Full Changelog: https://github.com/zaneveld/full_spectrum_bioinformatics/compare/release-2020.12.1...release-2022.3.1

    Source code(tar.gz)
    Source code(zip)
    full_spectrum_bioinformatics_2022.3.0.zip(182.17 MB)
  • release-2020.12.1(Dec 8, 2020)

    This is an initial development release of the Full Spectrum Bioinformatics online textbook. This is not a full release of the entire planned textbook, but rather an incremental development release of some content that is sufficiently developed that it has been used in classes.

    Some current features include: -- A series of open-access Jupyter Notebooks discussing topics in Bioinformatics. -- Links to Google Colab to allow students to run notebooks in a browser without installing software -- An outline table of contents shows planned sections, with sections that are in beta status available as live links. -- This release includes 21 new sections, covering topics ranging from sequence analysis to how to revise one's writing about statistical results:

    Foreword The Command Line Using the Command Line Exercise: Little Brother is Missing Exploring Python Exploring Python A Tour of Python Data Types Project Design Using Literature Surveys to Ask Good Questions and Propose Testable Hypotheses Biological Sequences An introduction to Biological Sequences Representing and Manipulating Biological Sequences as Python Strings Analyzing Biological Sequences with For Loops and If Statements Reading and writing FASTA files using Python 'Omics An Introduction to 'Omics Working with Tabular 'Omic data in Python using Pandas Phylogenetic Trees Representing Phylogenetic Trees with Python Classes Generating Trees Using Birth-Death Models Simulation Simulating the Population Genetics of Natural Selection and Genetic Drift Statistics Rank Transformations Monte Carlo simulation of Effect Size, Sample Size, and Significance Dealing with Multiple Comparisons Exercise: Revising your writing about statistical results Polishing and Publishing Presenting Research Careers that draw on Bioinformatics Applying for Grants

    NOTE: this is very similar to release-2020.12.0, other than minor edits to the readme but I need to re-release to trigger Zenodo to generate a DOI.

    Source code(tar.gz)
    Source code(zip)
  • release-2020.12.0(Dec 7, 2020)

    This is an initial development release of the Full Spectrum Bioinformatics online textbook. This is not a full release of the entire planned textbook, but rather an incremental development release of some content that is sufficiently developed that it has been used in classes.

    Some current features include: -- A series of open-access Jupyter Notebooks discussing topics in Bioinformatics. -- Links to Google Colab to allow students to run notebooks in a browser without installing software -- An outline table of contents shows planned sections, with sections that are in beta status available as live links. -- This release includes 21 new sections, covering topics ranging from sequence analysis to how to revise one's writing about statistical results:

    Foreword The Command Line Using the Command Line Exercise: Little Brother is Missing Exploring Python Exploring Python A Tour of Python Data Types Project Design Using Literature Surveys to Ask Good Questions and Propose Testable Hypotheses Biological Sequences An introduction to Biological Sequences Representing and Manipulating Biological Sequences as Python Strings Analyzing Biological Sequences with For Loops and If Statements Reading and writing FASTA files using Python 'Omics An Introduction to 'Omics Working with Tabular 'Omic data in Python using Pandas Phylogenetic Trees Representing Phylogenetic Trees with Python Classes Generating Trees Using Birth-Death Models Simulation Simulating the Population Genetics of Natural Selection and Genetic Drift Statistics Rank Transformations Monte Carlo simulation of Effect Size, Sample Size, and Significance Dealing with Multiple Comparisons Exercise: Revising your writing about statistical results Polishing and Publishing Presenting Research Careers that draw on Bioinformatics Applying for Grants

    Source code(tar.gz)
    Source code(zip)
    full_spectrum_bioinformatics.zip(84.89 MB)
Owner
Jesse Zaneveld
Jesse Zaneveld
spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines

spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines spaCy-wrap is minimal library intended for wrapping fine-tuned transformers from t

Kenneth Enevoldsen 32 Dec 29, 2022
YACLC - Yet Another Chinese Learner Corpus

汉语学习者文本多维标注数据集YACLC V1.0 中文 | English 汉语学习者文本多维标注数据集(Yet Another Chinese Learner

BLCU-ICALL 47 Dec 15, 2022
Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

THUNLP 118 Dec 30, 2022
Code Generation using a large neural network called GPT-J

CodeGenX is a Code Generation system powered by Artificial Intelligence! It is delivered to you in the form of a Visual Studio Code Extension and is Free and Open-source!

DeepGenX 389 Dec 31, 2022
Open source annotation tool for machine learning practitioners.

doccano doccano is an open source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequ

7.1k Jan 01, 2023
DziriBERT: a Pre-trained Language Model for the Algerian Dialect

DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect.

117 Jan 07, 2023
code for modular summarization work published in ACL2021 by Krishna et al

This repository contains the code for running modular summarization pipelines as described in the publication Krishna K, Khosla K, Bigham J, Lipton ZC

Kundan Krishna 6 Jun 04, 2021
CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus

CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus CVSS is a massively multilingual-to-English speech-to-speech translation corpus, co

Google Research Datasets 118 Jan 06, 2023
Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

ICTNLP 29 Oct 16, 2022
The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

Neural Machine Translation communication system The model is basically direct to convert one source language to another targeted language using encode

Nishant Banjade 7 Sep 22, 2022
Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.

flashgeotext ⚡ 🌍 Extract and count countries and cities (+their synonyms) from text, like GeoText on steroids using FlashText, a Aho-Corasick impleme

Ben 57 Dec 16, 2022
A Semi-Intelligent ChatBot filled with statistical and economical data for the Premier League.

MONEYBALL - ChatBot Module: 4006CEM, Class: B, Group: 5 Contributors: Jonas Djondo Roshan Kc Cole Samson Daniel Rodrigues Ihteshaam Naseer Kind remind

Jonas Djondo 1 Nov 18, 2021
Nested Named Entity Recognition

Nested Named Entity Recognition Training Dataset: CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark url: https://tianchi.aliyun.

8 Dec 25, 2022
Fake Shakespearean Text Generator

Fake Shakespearean Text Generator This project contains an impelementation of stateful Char-RNN model to generate fake shakespearean texts. Files and

Recep YILDIRIM 1 Feb 15, 2022
An open-source NLP research library, built on PyTorch.

An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks. Quic

AI2 11.4k Jan 01, 2023
Fastseq 基于ONNXRUNTIME的文本生成加速框架

Fastseq 基于ONNXRUNTIME的文本生成加速框架

Jun Gao 9 Nov 09, 2021
NLP-SentimentAnalysis - Coursera Course ( Duration : 5 weeks ) offered by DeepLearning.AI

Coursera Natural Language Processing Specialization This repository contains material related to Coursera Natural Language Processing Specialization.

Nishant Sharma 1 Jun 05, 2022
Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe NHV in the future.

Fast (GAN Based Neural) Vocoder Chinese README Todo Submit demo Support NHV Discription Include MelGAN, HifiGAN and Multiband-HifiGAN, maybe include N

Zhengxi Liu (刘正曦) 134 Dec 16, 2022
Reformer, the efficient Transformer, in Pytorch

Reformer, the Efficient Transformer, in Pytorch This is a Pytorch implementation of Reformer https://openreview.net/pdf?id=rkgNKkHtvB It includes LSH

Phil Wang 1.8k Dec 30, 2022
Blue Brain text mining toolbox for semantic search and structured information extraction

Blue Brain Search Source Code DOI Data & Models DOI Documentation Latest Release Python Versions License Build Status Static Typing Code Style Securit

The Blue Brain Project 29 Dec 01, 2022