BioThings API framework - Making high-performance API for biological annotation data

Overview

Downloads biothings package biothings_version biothings_version biothings_version Contributor Covenant Build Status Tests Status Documentation Status

BioThings SDK

Quick Summary

BioThings SDK provides a Python-based toolkit to build high-performance data APIs (or web services) from a single data source or multiple data sources. It has the particular focus on building data APIs for biomedical-related entities, a.k.a "BioThings" (such as genes, genetic variants, drugs, chemicals, diseases, etc.).

Documentation about BioThings SDK can be found at https://docs.biothings.io

Introduction

What's BioThings?

We use "BioThings" to refer to objects of any biomedical entity-type represented in the biological knowledge space, such as genes, genetic variants, drugs, chemicals, diseases, etc.

BioThings SDK

SDK represents "Software Development Kit". BioThings SDK provides a Python-based toolkit to build high-performance data APIs (or web services) from a single data source or multiple data sources. It has the particular focus on building data APIs for biomedical-related entities, a.k.a "BioThings", though it's not necessarily limited to the biomedical scope. For any given "BioThings" type, BioThings SDK helps developers to aggregate annotations from multiple data sources, and expose them as a clean and high-performance web API.

The BioThings SDK can be roughly divided into two main components: data hub (or just "hub") component and web component. The hub component allows developers to automate the process of monitoring, parsing and uploading your data source to an Elasticsearch backend. From here, the web component, built on the high-concurrency Tornado Web Server , allows you to easily setup a live high-performance API. The API endpoints expose simple-to-use yet powerful query features using Elasticsearch's full-text query capabilities and query language.

BioThings API

We also use "BioThings API" (or BioThings APIs) to refer to an API (or a collection of APIs) built with BioThings SDK. For example, both our popular MyGene.Info and MyVariant.Info APIs are built and maintained using this BioThings SDK.

BioThings Studio

BioThings Studio is a buildin, pre-configured environment used to build and administer a BioThings API. At its core is the Hub, a backend service responsible for maintaining data up-to-date, producing data releases and update API frontends.

Installing BioThings SDK

You can install the latest stable BioThings SDK release with pip from PyPI, like:

pip install biothings

You can install the latest development version of BioThings SDK directly from our github repository like:

pip install git+https://github.com/biothings/biothings.api.git#egg=biothings

Alternatively, you can download the source code, or clone the BioThings SDK repository and run:

python setup.py install

Get started to build a BioThings API

We recommend to follow this tutorial to develop your first BioThings API in our pre-configured BioThings Studio development environment.

Documentation

The latest documentation is available at https://docs.biothings.io.

How to contribute

Please check out this Contribution Guidelines and Code of Conduct document.

Comments
  • Unable to start either [demo_myvariant.docker, old_myvariant.docker]

    Unable to start either [demo_myvariant.docker, old_myvariant.docker]

    Following instructions from http://docs.biothings.io/en/latest/doc/standalone.html#quick-links Both images exhibit the same behavior:

    • hub not starting
      • curl http://localhost:19200/_cat/indices (returns nothing)
    • inability to ssh from host to container
    • inability to start hub cli

    Host information:

    $ lsb_release -a
    No LSB modules are available.
    Distributor ID:	Ubuntu
    Description:	Ubuntu 18.04 LTS
    Release:	18.04
    Codename:	bionic
    
    $ docker --version
    Docker version 17.12.1-ce, build 7390fc6
    
    $ docker info
    Containers: 0
     Running: 0
     Paused: 0
     Stopped: 0
    Images: 84
    Server Version: 17.12.1-ce
    Storage Driver: overlay2
     Backing Filesystem: extfs
     Supports d_type: true
     Native Overlay Diff: true
    Logging Driver: json-file
    Cgroup Driver: cgroupfs
    Plugins:
     Volume: local
     Network: bridge host macvlan null overlay
     Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
    Swarm: inactive
    Runtimes: runc
    Default Runtime: runc
    Init Binary: docker-init
    containerd version: 9b55aab90508bd389d7654c4baf173a981477d55
    runc version: 9f9c96235cc97674e935002fc3d78361b696a69e
    init version: v0.13.0 (expected: 949e6facb77383876aeff8a6944dde66b3089574)
    Security Options:
     apparmor
     seccomp
      Profile: default
    Kernel Version: 4.15.0-22-generic
    Operating System: Ubuntu 18.04 LTS
    OSType: linux
    Architecture: x86_64
    CPUs: 24
    Total Memory: 118GiB
    Name: bmeg-build
    ID: JRG3:XIRP:VOMU:Z5IM:CBNN:M2TT:QP6J:TE4G:B6V4:C5KI:RZ6T:S7ZX
    Docker Root Dir: /var/lib/docker
    Debug Mode (client): false
    Debug Mode (server): false
    Registry: https://index.docker.io/v1/
    Labels:
    Experimental: false
    Insecure Registries:
     127.0.0.0/8
    Live Restore Enabled: false
    
    
    
    # docker run --name old_myvariant -p 19080:80 -p 19200:9200 -p 19022:7022 -p 19090:7080 -d old_myvariant
    2171a1bd50736f4074c9c3102282ae4b92a8002335347217d65a5e8681b49c3f
    [email protected]:/mnt/walsbr#  curl -v http://localhost:19080/metadata
    *   Trying 127.0.0.1...
    * TCP_NODELAY set
    * Connected to localhost (127.0.0.1) port 19080 (#0)
    > GET /metadata HTTP/1.1
    > Host: localhost:19080
    > User-Agent: curl/7.58.0
    > Accept: */*
    >
    < HTTP/1.1 500 Internal Server Error
    < Date: Fri, 13 Jul 2018 20:11:23 GMT
    < Content-Type: text/html; charset=UTF-8
    < Content-Length: 93
    < Connection: keep-alive
    < Server: TornadoServer/4.5.1
    <
    * Connection #0 to host localhost left intact
    <html><title>500: Internal Server Error</title><body>500: Internal Server Error</body></html>[email protected]:/mnt/walsbr#
    
    
    [email protected]:/mnt/walsbr#  curl http://localhost:19200/_cat/indices
    
    [email protected]:/mnt/walsbr#  ssh [email protected] -p 19022
    ssh_exchange_identification: read: Connection reset by peer
    
    
    [email protected]:/mnt/walsbr# docker exec -it old_myvariant bash
    
    
    Traceback (most recent call last):
      File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
        "__main__", mod_spec)
      File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
        exec(code, run_globals)
      File "/home/biothings/myvariant.info/src/biothings/bin/autohub.py", line 16, in <module>
        biothings.config_for_app(config)
      File "/home/biothings/myvariant.info/src/biothings/__init__.py", line 55, in config_for_app
        check_config(config_mod)
      File "/home/biothings/myvariant.info/src/biothings/__init__.py", line 32, in check_config
        raise ConfigurationError("%s: %s" % (attr,str(getattr(config_mod,attr))))
    biothings.ConfigurationError: DATA_PLUGIN_FOLDER: Define path to folder which will contain all 3rd party parsers, dumpers, etc...
    (pyenv) [email protected]:~/myvariant.info/src$
    
    
    bug 
    opened by bwalsh 16
  • Fetch >1000 documents with a POST query

    Fetch >1000 documents with a POST query

    Originally from @colleenXu:

    To find associations between things, we are mostly doing POST queries to biothings apis.

    For POST queries, we can retrieve <=1000 records per input (think of a batch-query of input IDs like below). This allows a batch-query to include up to 1000 inputs.

    POST to https://mydisease.info/v1/query?fields=disgenet.xrefs,_id&size=1000 with the body: { "q": "7157,7180,7190", "scopes": "disgenet.genes_related_to_disease.gene_id" }

    My understanding is that there's only 1 way to change this situation:

    1. CANNOT DO fetch_all, since that only works for GET queries (and just using GET queries isn't a viable solution because not being able to batch-query can slow down multi-hop BTE queries quite a bit).
    2. CAN DO: The only way to get >1000 records per input is to adjust the biothings api settings - which would likely involve lowering the batch-query limit (ex: 10000 records per input and 100 IDs per batch). This can perhaps be done on a per-api basis (like specific pending apis???)

    Noting that this has been a discussion topic for a while. And for now, we've been keeping okay at keeping things at <= 1000 records per input, knowing that we are not getting the complete response. This is because it is difficult handling a node attached to lots of other entities...

    However, this is known to be more of an issue for APIs that keep many separate records for the same basic association X-related_to-Y. This happens with semmeddb (at least 1 record per publication-association) and some multiomics apis. These are all on the pending api hub.

    enhancement 
    opened by erikyao 12
  • Automated basic biothings web functionality test for data applications

    Automated basic biothings web functionality test for data applications

    Now that we have automated tests to run after each app (mygene, myvariant, ...) build under development, explore the possibility to run an automated basic biothings functionality test to ensure in addition to the customizations working, the basic features are also not affected.

    opened by namespacestd0 9
  • FTPDumper does not clean up old downloaded files when ARCHIVE is set to False

    FTPDumper does not clean up old downloaded files when ARCHIVE is set to False

    When ARCHIVE is set to False, the downloaded files will be saved to the same folder (Data_Archive_root/latest). Looks like FTPDumper does not clean up the old downloads.

    Here is an example from mychem hub: PubChemDumper.

    bug 
    opened by newgene 8
  • remove remaining boto dependency (using boto3 instead)

    remove remaining boto dependency (using boto3 instead)

    We still have a few remaining places using boto to access AWS s3 buckets:

    https://github.com/biothings/biothings.api/blob/621887f04aae13c3a775aea9aa7daacb92ae7ef0/biothings/utils/aws.py#L6

    and

    https://github.com/biothings/biothings.api/blob/621887f04aae13c3a775aea9aa7daacb92ae7ef0/biothings/hub/dataexport/ids.py#L4

    Most of other AWS related code has been migrated to use boto3. Let's remove the boto dependency completely.

    enhancement 
    opened by newgene 8
  • display optional

    display optional "description" in API metadata

    If a user wants to know what a given API is for (e.g., "repodb"), the best options now seem to be to search the API name and/or look at example records. We should also allow an optional "description" to be provided in the manifest.json metadata, which would optionally provide some human-readable description directly on the API page.

    EDIT: Sorry, just realized I should probably have created this issue in https://github.com/biothings/pending.api/issues.. Feel free to recreate it over there if helpful.

    enhancement 
    opened by andrewsu 7
  • Replace boto calls to use boto3

    Replace boto calls to use boto3

    This partially addresses Issue #133

    All uses except for one usage has been replaced, and tested.

    The explanation for the one remaining usage will be documented in comments under Issue #133

    opened by zcqian 7
  • `SnapshotTaskEnv` cannot create `ESIndexer` instances

    `SnapshotTaskEnv` cannot create `ESIndexer` instances

    When I create a new snapshot, the corresponding SnapshotTaskEnv instance cannot initiate due to the failure in the creation of the ESIndexer instance.

    Error messages are like:

    Aug 16 17:37:09 su06 python[57443]: HTTPServerRequest(protocol='http', host='localhost:19080', method='PUT', uri='/snapshot', version='HTTP/1.1', remote_ip='172.29.80.35')
    Aug 16 17:37:09 su06 python[57443]: Traceback (most recent call last):
    Aug 16 17:37:09 su06 python[57443]:   File "/opt/home/pending/venv/lib/python3.6/site-packages/tornado/web.py", line 1704, in _execute
    Aug 16 17:37:09 su06 python[57443]:     result = await result
    Aug 16 17:37:09 su06 python[57443]:   File "<string>", line 69, in put
    Aug 16 17:37:09 su06 python[57443]:   File "/opt/home/pending/venv/src/biothings/biothings/hub/dataindex/snapshooter.py", line 521, in snapshot
    Aug 16 17:37:09 su06 python[57443]:     return env_for_build.snapshot(index, snapshot=snapshot, steps=steps)
    Aug 16 17:37:09 su06 python[57443]:   File "/opt/home/pending/venv/src/biothings/biothings/hub/dataindex/snapshooter.py", line 314, in snapshot
    Aug 16 17:37:09 su06 python[57443]:     task_env = SnapshotTaskEnv(self, index, snapshot)
    Aug 16 17:37:09 su06 python[57443]:   File "/opt/home/pending/venv/src/biothings/biothings/hub/dataindex/snapshooter.py", line 244, in __init__
    Aug 16 17:37:09 su06 python[57443]:     doc_type=env.build_doc['index'][index]['doc_type'],
    Aug 16 17:37:09 su06 python[57443]: KeyError: 'doc_type'
    Aug 16 17:37:09 su06 python[57443]: ERROR:tornado.access:500 PUT /snapshot (172.29.80.35) 37.72ms
    

    The root causes are:

    1. src_build does not hold the doc_type values anymore
    2. The ESIndexer class definition is out-of-date with ES7

    A sample env.build_doc['index'][index] entry is:

            "index" : {
    		"idisk_20210812_sicwkhq0" : {
    			"host" : "su03:9200",
    			"environment" : "su03",
    			"created_at" : ISODate("2021-08-16T23:17:07.550Z"),
    			"count" : 919
    		}
    	},
    
    opened by erikyao 6
  • `GitDumper` should also checkout the `main` branches

    `GitDumper` should also checkout the `main` branches

    When adding a new plugin to pending.biothings.io, the URL to the GitHub repo will be passed to http://localhost:19080/dataplugin/register_url,

    image

    which in the end calls the AssistantManager.register_url() method (assistant.py#L699).

    The AssistantManager instance looks like to add a message (including the URL to register) to its corresponding MongoDB collection, and finally a GitDumper instance will receive the URL and then check the repo out. By default, GitDumper will only check master branches, but recently GitHub has changed its default branch name from master to main. Therefore our GitDumper cannot checkout the latest GitHub repo-based plugins.

    The root cause in the code seems to be dumper.py#L1072:

    DEFAULT_BRANCH = "master"
    
    bug enhancement 
    opened by erikyao 6
  • Implement full release installation without downtime

    Implement full release installation without downtime

    Currently from the biothings hub and studio, we can install incremental releases directly with no downtime (applying diffs on the production index). The relevant code is here:

    https://github.com/biothings/biothings.api/blob/1b96f0aded05873d642134c0c38b15fa982e3b6d/biothings/hub/standalone/init.py#L68

    In the case of deploying a full release, we currently have two options:

    1. delete the old index and then install the new index (restoring snapshots) This causes downtime, but it's brief for small data indices.

    2. we perform a manual index restoration to a different index name and then switch the alias when it's done. This has no downtime, should be preferred.

    We should implement these manual steps from the #2 option as a feature in biothings hub/studio.

    enhancement 
    opened by newgene 6
  • Can manifest-based data plugins and regular data sources stay in the same folder?

    Can manifest-based data plugins and regular data sources stay in the same folder?

    Right now, manifest-based data plugins stay in a separate folder like "plugins", and regular dumper/uploader based data sources stay in hub/dataload/sources folder.

    Can we allow them in the same folder? It may just work already, let's verify and try to make it work if not.

    • plugins folder example: https://github.com/biothings/mydisease.info/tree/master/src/plugins

    • regular data sources example: https://github.com/newgene/biothings_docker/tree/main/tests/hubapi/demohub/biothing_studio/hub/dataload/sources

    enhancement 
    opened by newgene 5
  • Create an Elasticsearch reindex helper function

    Create an Elasticsearch reindex helper function

    This should be a standalone helper function (e.g. can be under utils/es_reindex.py) used only from the Python/iPython console manually when needed. It helps to reindex an existing index by transferring both the settings, mappings and docs.

    Elasticsearch's reindex API should be used, however mappings and settings from the old index should be used to create a new empty target index first. Then reindex API can be called to transfer all docs to the new index. Optionally, the alias should be switched over to the new index too. This is useful when we need to migrate existing indices created from the older ES version to the current ES version.

    def reindex(src_index, target_index=None, settings=None, mappings=None, alias=None, delete_src=False):
    
    
    • target_index: use <src_index_name>_reindexed as default if None
    • settings: if provided as a dict, update the settings with the provided dictionary. Otherwise, keep the same from the src_index
    • mapping: if provided as a dict, update the settings with the provided mappings. Otherwise, keep the same from the src_index
    • alias: if True, switch the alias from src_index to target_index. If src_index has no alias, apply the <src_index_name> as the alias; if a string value, apply it as the alias instead
    • delete_src: If True, delete the src_index after everything is done

    And after reindex, please also do a refresh & flush and then double-check the doc counts to make sure they are equal.

    enhancement 
    opened by newgene 0
  • Fix Aggregation Web formatter

    Fix Aggregation Web formatter

    Histogram aggregations do not contain the fields aggregations.<term>.doc_count_error_upper_bound, aggregations.<term>.sum_other_doc_count

    In our es formatter, it assumes that these two values exist. https://github.com/biothings/biothings.api/blob/master/biothings/web/query/formatter.py#L426-L427

    When making a custom aggregation in the esquerybuilder, I have to override these two values.

    res[facet]['other'] = res[facet].pop('sum_other_doc_count', 0)
    res[facet]['missing'] = res[facet].pop('doc_count_error_upper_bound', 0)
    

    We should have this as a quick fix so when other users make custom aggregations they wont have to override the transform_aggs method.

    opened by jal347 0
  • DockerContainerDumper class

    DockerContainerDumper class

    This can be a new type of Dumper class, which triggers a docker container (typically runs on a different server) to run and generate the output file, and then stop the container. The dumper class will then get the processed file(s) and send it to the Uploader as normally.

    Typically, this processed file can be a NDJSON file (one JSON per line), so the uploader class can be quite simple and generic.

    The typical use case is some complex workflow with heavy dependencies, so we can isolate them in a docker container.

    enhancement 
    opened by newgene 0
  • create biothings.hub API document

    create biothings.hub API document

    Our current documentation site at https://docs.biothings.io/ does not contain API document from biothings.hub module. It was due to some errors in the past, let's re-evaluate to see if we can generate it automatically now.

    enhancement 
    opened by newgene 0
  • Evaluate and upgrade Elasticsearch v8.x client

    Evaluate and upgrade Elasticsearch v8.x client

    for both elasticsearch-py and elasticsearch-dsl packages, their ES8 support is complete. We should test and upgrade.

    All of our hubs are now using ES8, but we should target the support of both ES7 and ES8 if possible.

    enhancement 
    opened by newgene 0
Releases(v0.11.1)
  • v0.11.1(Oct 4, 2022)

    This is a bug-fix release with these CHANGES (see also CHANGES.txt):

    v0.11.1 (2022/10/03)

    • Hub improvements:
      • use pickle protocol 4 as the pickle.dump default
    • Hub bug fixes:
      • Fixed a JSON serialization error during incremental release https://github.com/newgene/biothings.api/pull/65
      • Resolved a hub error when installing a full release https://github.com/biothings/biothings.api/issues/257
      • Fixed a quick_index error when a data source has multiple uploaders https://github.com/newgene/biothings.api/pull/66
    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(Sep 14, 2022)

Owner
BioThings
High Performance Data APIs in Biology
BioThings
对hermit 的API进行简单的封装,做成了这个python moudle

hermit-py 对hermit 的API进行简单的封装,做成了这个Python Moudle,推荐通过wheel的方式安装。 目前对点击、滑动、模拟输入、找组件、等支持较好,支持查看页面的实时布局信息,再通过布局信息进行点击滑动等操作。 支持剪贴板相关的操作,支持设置剪贴的任意语言内容。

LookCos 40 Jun 25, 2022
a discord bot for searching your movies, and bot return movie url for you :)

IMDb Discord Bot how to run this bot. the first step you must create prefixes.json file the second step you must create a virtualenv if you use window

Mehdi Radfar 6 Dec 20, 2022
An API which returns random AOT quote everytime it's invoked

An API which returns random AOT quote everytime it's invoked

Nishant Sapkota 1 Feb 07, 2022
GUI Pancakeswap V2 and Uniswap V3 trading client (and bot) MOST ADVANCE TRADING BOT SUPPORT WINDOWS LINUX MAC (BUY TOKEN ON LAUNCH)

GUI Pancakeswap 2 and Uniswap 3 SNIPER BOT 🏆 🥇 (MOST ADVANCE TRADING BOT SUPPORT WINDOWS LINUX MAC) (AUTO BUY TOKEN ON LAUNCH AFTER ADD LIQUIDITY) S

HYDRA 16 Dec 22, 2021
Faster Twitch Alerts is a highly customizable, lightning-fast alternative to Twitch's slow mobile notification system

Faster Twitch Alerts What is "Faster Twitch Alerts"? Faster Twitch Alerts is a highly customizable, lightning-fast alternative to Twitch's slow mobile

6 Dec 22, 2022
Bendford analysis of Ethereum transaction

Bendford analysis of Ethereum transaction The python script script.py extract from already downloaded archive file the ethereum transaction. The value

sleepy ramen 2 Dec 18, 2021
Yes, it's true :heartbeat: This repository has 337 stars.

Yes, it's true! Inspired by a similar repository from @RealPeha, but implemented using a webhook on AWS Lambda and API Gateway, so it's serverless! If

512 Jan 01, 2023
𝗖𝝠𝝦𝝩𝝠𝝞𝝥 𝝦𝗥𝝞𝗖𝝽°™️ 🇱🇰 Is An All In One Media Inline Bot Made For Inline Your Media Effectively With Some Advance Security Tools♥️

𝗖𝝠𝝦𝝩𝝠𝝞𝝥 𝝦𝗥𝝞𝗖𝝽° ™️ 🇱🇰 𝗙𝗘𝝠𝝩𝗨𝗥𝗘𝗦 Auto Filter IMDB Admin Commands Broadcast Index IMDB Search Inline Search Random Pics Ids & User I

Kɪꜱᴀʀᴀ Pᴇꜱᴀɴᴊɪᴛʜ Pᴇʀᴇʀᴀ 〄 13 Jun 21, 2022
Orca is an extensive and extendable Python 3.x library for the Discord API.

Orca is an extensive and extendable Python 3.x library for the Discord API.

RPS 4 Apr 03, 2022
Tiktok-bot - A Simple Tiktok bot With Python

Install the requirements pip install selenium pip install pyfiglet==0.7.5 How ca

Muchlis Faroqi 5 Aug 23, 2022
Telegram PHub Bot using ARQ Api and Pyrogram. This Bot can Download and Send PHub HQ videos in Telegram using ARQ API.

Tg_PHub_Bot Telegram PHub Bot using ARQ Api and Pyrogram. This Bot can Download and Send PHub HQ videos in Telegram using ARQ API. OS Support All linu

TheProgrammerCat 13 Oct 21, 2022
A Python SDK for connecting devices to Microsoft Azure IoT services

V2 - We are now GA! This repository contains code for the Azure IoT SDKs for Python. This enables python developers to easily create IoT device soluti

Microsoft Azure 381 Dec 30, 2022
discord voice bot to stream radio

Radio-Id Bot (Discord Voice Bot) Radio-id-bot (Radio Indonesia) is a simple Discord Music Bot built with discord.py to play a radio from some Indonesi

Adi Fahmi 20 Sep 20, 2022
🦊 Powerfull Discord Nitro Generator

🦊 Follow me here 🦊 Discord | YouTube | Github ☕ Usage 💻 Downloading git clone https://github.com/KanekiWeb/Nitro-Generator/new/main pip insta

Kaneki 104 Jan 02, 2023
Python client for Messari's API

Messari API Messari provides a free API for crypto prices, market data metrics, on-chain metrics, and qualitative information (asset profiles). This d

Messari 85 Dec 22, 2022
Polar devices Python API and CLI.

loophole - Polar devices API About Python API for Polar devices. Command line interface included. Tested with: A360 Loop M400 Installation pip install

[roscoe] 145 Sep 14, 2022
Convenient script for trading with python.

Convenient script for trading with python.

VladKochetov007 66 Dec 07, 2022
Discord raiding tool. Made in python 3.9

XSpammer Discord raiding tool with 20 features. YT Showcase Requirements/Installation Python 3.7+ [https://python.org] Run setup.bat to install the es

Tiie 6 Oct 24, 2022
Neko is An Anime themed advance Telegram group management bot.

NekoRobot A modular telegram Python bot running on python3 with an sqlalchemy, mongodb database. ╒═══「 Status 」 Maintained Support Group Included Free

Lovely Prince 11 Oct 11, 2022
Replacement for the default Dark Sky Home Assistant integration using Pirate Weather

Pirate Weather Integrations This integration is designed to replace the default Dark Sky integration in Home Assistant with a slightly modified, but f

Alexander Rey 129 Jan 06, 2023