gget

gget is a free and open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.

Please cite the following paper:
Luebbert, L. & Pachter, L. (2022). Efficient querying of genomic databases for single-cell RNA-seq with gget. bioRxiv 2022.05.17.492392; doi: https://doi.org/10.1101/2022.05.17.492392

gget currently consists of the following nine modules:

gget ref
Fetch File Transfer Protocols (FTPs) and metadata for reference genomes and annotations from Ensembl by species.
gget search
Fetch genes and transcripts from Ensembl using free-form search terms.
gget info
Fetch extensive gene and transcript metadata from Ensembl, UniProt, and NCBI using Ensembl IDs.
gget seq
Fetch nucleotide or amino acid sequences of genes or transcripts from Ensembl or UniProt, respectively.
gget blast
BLAST a nucleotide or amino acid sequence to any BLAST database.
gget blat
Find the genomic location of a nucleotide or amino acid sequence using BLAT.
gget muscle
Align multiple nucleotide or amino acid sequences to each other using Muscle5.
gget enrichr
Perform an enrichment analysis on a list of genes using Enrichr.
gget archs4
Find the most correlated genes to a gene of interest or find the gene's tissue expression atlas using ARCHS4.

Installation

pip install gget

For use in Jupyter Lab / Google Colab:

import gget

Quick start guide

# Fetch all Homo sapiens reference and annotation FTPs from the latest Ensembl release
$ gget ref -s homo_sapiens

# Search human genes with "ace2" AND "angiotensin" in their name/description
$ gget search -sw ace2,angiotensin -s homo_sapiens -ao and 

# Look up gene ENSG00000130234 (ACE2) with expanded info (returns all transcript isoforms for genes)
$ gget info -id ENSG00000130234 -e

# Fetch the amino acid sequence of the canonical transcript of gene ENSG00000130234
$ gget seq -id ENSG00000130234 --seqtype transcript

# Quickly find the genomic location of (the start of) that amino acid sequence
$ gget blat -seq MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS

# Blast (the start of) that amino acid sequence
$ gget blast -seq MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS

# Align nucleotide or amino acid sequences stored in a FASTA file
$ gget muscle -fa path/to/file.fa

# Use Enrichr to find the ontology of a list of genes
$ gget enrichr -g ACE2 AGT AGTR1 ACE AGTRAP AGTR2 ACE3P -db ontology

# Get the human tissue expression atlas of gene ACE2
$ gget archs4 -g ACE2 -w tissue

Jupyter Lab / Google Colab:

gget.ref("homo_sapiens")
gget.search(["ace2", "angiotensin"], "homo_sapiens", andor="and")
gget.info("ENSG00000130234", expand=True)
gget.seq("ENSG00000130234", seqtype="transcript")
gget.blat("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
gget.blast("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
gget.muscle("path/to/file.fa")
gget.enrichr(["ACE2", "AGT", "AGTR1", "ACE", "AGTRAP", "AGTR2", "ACE3P"], database="ontology", plot=True)
gget.archs4("ACE2", which="tissue")

Manual

Jupyter Lab / Google Colab arguments are equivalent to long-option arguments (--arg).
The manual for any gget tool can be called from terminal using the -h --help flag.

gget ref

Fetch FTPs and their respective metadata (or use flag ftp to only return the links) for reference genomes and annotations from Ensembl by species.
Return format: dictionary/json.

Required arguments
-s --species
Species for which the FTPs will be fetched in the format genus_species, e.g. homo_sapiens.
Note: Not required when calling flag [--list_species].
Supported shortcuts: 'human', 'mouse'

Optional arguments
-w --which
Defines which results to return. Default: 'all' -> Returns all available results.
Possible entries are one or a combination of the following:
'gtf' - Returns the annotation (GTF).
'cdna' - Returns the trancriptome (cDNA).
'dna' - Returns the genome (DNA).
'cds' - Returns the coding sequences corresponding to Ensembl genes. (Does not contain UTR or intronic sequence.)
'cdrna' - Returns transcript sequences corresponding to non-coding RNA genes (ncRNA).
'pep' - Returns the protein translations of Ensembl genes.

-r --release
Defines the Ensembl release number from which the files are fetched, e.g. 104. Default: latest Ensembl release.

-o --out
Path to the json file the results will be saved in, e.g. path/to/directory/results.json. Default: Standard out.
Jupyter Lab / Google Colab: save=True will save the output in the current working directory.

Flags
-l --list_species
Lists all available species. (Jupyter Lab / Google Colab: combine with species=None.)

-ftp --ftp
Returns only the requested FTP links.

-d --download
Downloads the requested FTPs to the current directory (requires curl to be installed).

Examples

Use gget ref in combination with kallisto | bustools to build a reference index:

kb ref -i INDEX -g T2G -f1 FASTA $(gget ref --ftp -w dna,gtf -s homo_sapiens)

→ kb ref builds a reference index using the latest DNA and GTF files of species Homo sapiens passed to it by gget ref.

Get all available genomes:

gget ref --list -r 103

# Jupyter Lab / Google Colab:
gget.ref(species=None, list_species=True, release=103)

→ Returns a list with all available genomes (checks if GTF and FASTAs are available) from Ensembl release 103.
(If no release is specified, gget ref will always return information from the latest Ensembl release.)

Get the genome reference for a specific species:

gget ref -s homo_sapiens -w gtf dna

# Jupyter Lab / Google Colab:
gget.ref("homo_sapiens", which=["gtf", "dna"])

→ Returns a json with the latest human GTF and FASTA FTPs, and their respective metadata, in the format:

{
    "homo_sapiens": {
        "annotation_gtf": {
            "ftp": "http://ftp.ensembl.org/pub/release-106/gtf/homo_sapiens/Homo_sapiens.GRCh38.106.gtf.gz",
            "ensembl_release": 106,
            "release_date": "28-Feb-2022",
            "release_time": "23:27",
            "bytes": "51379459"
        },
        "genome_dna": {
            "ftp": "http://ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz",
            "ensembl_release": 106,
            "release_date": "21-Feb-2022",
            "release_time": "09:35",
            "bytes": "881211416"
        }
    }
}

More examples

gget search

Fetch genes and transcripts from Ensembl using free-form search terms.
Return format: data frame.

Required arguments
-sw --searchwords
One or more free form search words, e.g. gaba, nmda. (Note: Search is not case-sensitive.)

-s --species
Species or database to be searched.
A species can be passed in the format 'genus_species', e.g. 'homo_sapiens'.
To pass a specific database, pass the name of the CORE database, e.g. 'mus_musculus_dba2j_core_105_1'.
All availabale databases can be found here.
Supported shortcuts: 'human', 'mouse'.

Optional arguments
-st --seqtype
'gene' (default) or 'transcript'
Returns genes or transcripts, respectively.

-ao --andor
'or' (default) or 'and'
'or': Returns all genes that INCLUDE AT LEAST ONE of the searchwords in their name/description.
'and': Returns only genes that INCLUDE ALL of the searchwords in their name/description.

-l --limit
Limits the number of search results, e.g. 10. Default: None.

-o --out
Path to the csv the results will be saved in, e.g. path/to/directory/results.csv. Default: Standard out.
Jupyter Lab / Google Colab: save=True will save the output in the current working directory.

Flags
wrap_text
Jupyter Lab / Google Colab only. wrap_text=True displays data frame with wrapped text for easy reading (default: False).

Example

gget search -sw gaba gamma-aminobutyric -s homo_sapiens

# Jupyter Lab / Google Colab:
gget.search(["gaba", "gamma-aminobutyric"], "homo_sapiens")

→ Returns all genes that contain at least one of the search words in their name or Ensembl/external reference description:

ensembl_id	gene_name	ensembl_description	ext_ref_description	biotype	url
ENSG00000034713	GABARAPL2	GABA type A receptor associated protein like 2 [Source:HGNC Symbol;Acc:HGNC:13291]	GABA type A receptor associated protein like 2	protein_coding	https://uswest.ensembl.org/homo_sapiens/Gene/Summary?g=ENSG00000034713
. . .	. . .	. . .	. . .	. . .	. . .

More examples

gget info

Fetch extensive gene and transcript metadata from Ensembl, UniProt, and NCBI using Ensembl IDs.
Return format: data frame.

Required arguments
-id --ens_ids
One or more Ensembl IDs.

Optional arguments
-o --out
Path to the csv the results will be saved in, e.g. path/to/directory/results.csv. Default: Standard out.
Jupyter Lab / Google Colab: save=True will save the output in the current working directory.

Flags
-e --expand
Expands returned information (only for gene and transcript IDs).
For genes, adds information on all known transcripts.
For transcripts, adds information on all known translations and exons.

wrap_text
Jupyter Lab / Google Colab only. wrap_text=True displays data frame with wrapped text for easy reading (default: False).

Example

gget info -id ENSG00000034713 ENSG00000104853 ENSG00000170296 -e

# Jupyter Lab / Google Colab:
gget.info(["ENSG00000034713", "ENSG00000104853", "ENSG00000170296"], expand=True)

→ Returns extensive information about each requested Ensembl ID in data frame format:

	uniprot_id	ncbi_gene_id	primary_gene_name	synonyms	protein_names	ensembl_description	uniprot_description	ncbi_description	biotype	canonical_transcript	...
ENSG00000034713	P60520	11345	GABARAPL2	[ATG8, ATG8C, FLC3A, GABARAPL2, GATE-16, GATE16, GEF-2, GEF2]	Gamma-aminobutyric acid receptor-associated protein like 2 (GABA(A) receptor-associated protein-like 2)...	GABA type A receptor associated protein like 2 [Source:HGNC Symbol;Acc:HGNC:13291]	FUNCTION: Ubiquitin-like modifier involved in intra- Golgi traffic (By similarity). Modulates intra-Golgi transport through coupling between NSF activity and ...	Enables ubiquitin protein ligase binding activity. Involved in negative regulation of proteasomal protein catabolic process and protein...	protein_coding	ENST00000037243.7	...
. . .	. . .	. . .	. . .	. . .	. . .	. . .	. . .	. . .	. . .	. . .	...

More examples

gget seq

Fetch nucleotide or amino acid sequence of a gene (and all its isoforms) or a transcript by Ensembl ID.
Return format: FASTA.

Required arguments
-id --ens_ids
One or more Ensembl IDs.

Optional arguments
-st --seqtype
'gene' (default) or 'transcript'.
Defines whether nucleotide or amino acid sequences are returned.
Nucleotide sequences are fetched from Ensembl.
Amino acid sequences are fetched from UniProt.

-o --out
Path to the file the results will be saved in, e.g. path/to/directory/results.fa. Default: Standard out.
Jupyter Lab / Google Colab: save=True will save the output in the current working directory.

Flags
-i --isoforms
Returns the sequences of all known transcripts.
(Only for gene IDs in combination with seqtype=transcript.)

Examples

gget seq -id ENSG00000034713 ENSG00000104853 ENSG00000170296

# Jupyter Lab / Google Colab:
gget.seq(["ENSG00000034713", "ENSG00000104853", "ENSG00000170296"])

→ Returns the nucleotide sequences of ENSG00000034713, ENSG00000104853, and ENSG00000170296 in FASTA format.

gget seq -id ENSG00000034713 -st transcript -iso

# Jupyter Lab / Google Colab:
gget.seq("ENSG00000034713", seqtype="transcript", isoforms=True)

→ Returns the amino acid sequences of all known transcripts of ENSG00000034713 in FASTA format.

More examples

gget blast

BLAST a nucleotide or amino acid sequence to any BLAST database.
Return format: data frame.

Required arguments
-seq --sequence
Nucleotide or amino acid sequence, or path to FASTA or .txt file.

Optional arguments
-p --program
'blastn', 'blastp', 'blastx', 'tblastn', or 'tblastx'.
Default: 'blastn' for nucleotide sequences; 'blastp' for amino acid sequences.

-db --database
'nt', 'nr', 'refseq_rna', 'refseq_protein', 'swissprot', 'pdbaa', or 'pdbnt'.
Default: 'nt' for nucleotide sequences; 'nr' for amino acid sequences.
More info on BLAST databases

-l --limit
Limits number of hits to return. Default: 50.

-e --expect
Defines the expect value cutoff. Default: 10.0.

Flags
-lcf --low_comp_filt
Turns on low complexity filter.

-mbo --megablast_off
Turns off MegaBLAST algorithm. Default: MegaBLAST on (blastn only).

-q --quiet
Prevents progress information from being displayed.

wrap_text
Jupyter Lab / Google Colab only. wrap_text=True displays data frame with wrapped text for easy reading (default: False).

Example

gget blast -seq MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR

# Jupyter Lab / Google Colab:
gget.blast("MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR")

→ Returns the BLAST result of the sequence of interest in data frame format. gget blast automatically detects this sequence as an amino acid sequence and therefore sets the BLAST program to blastp with database nr.

Description	Scientific Name	Common Name	Taxid	Max Score	Total Score	Query Cover	...
PREDICTED: gamma-aminobutyric acid receptor-as...	Colobus angolensis palliatus	NaN	336983	180	180	100%	...
. . .	. . .	. . .	. . .	. . .	. . .	. . .	...

BLAST from .fa or .txt file:

gget blast -seq fasta.fa

# Jupyter Lab / Google Colab:
gget.blast("fasta.fa")

→ Returns the BLAST results of the first sequence contained in the fasta.fa file.

More examples

gget blat

Find the genomic location of a nucleotide or amino acid sequence using BLAT.
Return format: data frame.

Required arguments
-seq --sequence
Nucleotide or amino acid sequence, or path to FASTA or .txt file.

Optional arguments
-st --seqtype
'DNA', 'protein', 'translated%20RNA', or 'translated%20DNA'.
Default: 'DNA' for nucleotide sequences; 'protein' for amino acid sequences.

-a --assembly
'human' (hg38) (default), 'mouse' (mm39), 'zebrafinch' (taeGut2),
or any of the species assemblies available here (use short assembly name).

Example

gget blat -seq MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR -a taeGut2

# Jupyter Lab / Google Colab:
gget.blat("MKWMFKEDHSLEHRCVESAKIRAKYPDRVPVIVEKVSGSQIVDIDKRKYLVPSDITVAQFMWIIRKRIQLPSEKAIFLFVDKTVPQSR", assembly="taeGut2")

→ Returns BLAT results for assembly taeGut2 (zebra finch) in data frame format. In the above example, gget blat automatically detects this sequence as an amino acid sequence and therefore sets the BLAT seqtype to protein.

genome	query_size	aligned_start	aligned_end	matches	mismatches	%_aligned	...
taeGut2	88	12	88	77	0	87.5	...

More examples

gget muscle

Align multiple nucleotide or amino acid sequences to each other using Muscle5.
Return format: ClustalW formatted standard out or aligned FASTA.

Required arguments
-fa --fasta
Path to FASTA or .txt file containing the nucleotide or amino acid sequences to be aligned.

Optional arguments
-o --out
Path to the aligned FASTA file the results will be saved in, e.g. path/to/directory/results.afa. Default: Standard out.
Jupyter Lab / Google Colab: save=True will save the output in the current working directory.

Flags
-s5 --super5
Aligns input using the Super5 algorithm instead of the Parallel Perturbed Probcons (PPP) algorithm to decrease time and memory.
Use for large inputs (a few hundred sequences).

wrap_text
Jupyter Lab / Google Colab only. wrap_text=True displays data frame with wrapped text for easy reading (default: False).

Example

gget muscle -fa fasta.fa

# Jupyter Lab / Google Colab:
gget.muscle("fasta.fa")

→ Returns an overview of the aligned sequences with ClustalW coloring. (To return an aligned FASTA (.afa) file, use --out argument (or save=True in Jupyter Lab/Google Colab).) In the above example, the 'fasta.fa' includes several sequences to be aligned (e.g. isoforms returned from gget seq).

More examples

gget enrichr

Perform an enrichment analysis on a list of genes using Enrichr.
Return format: data frame.

Required arguments
-g --genes
Short names (gene symbols) of genes to perform enrichment analysis on, e.g. 'PHF14 RBM3 MSL1 PHF21A'.

-db --database
Database to use as reference for the enrichment analysis.
Supports any database listed here under 'Gene-set Library' or one of the following shortcuts:
'pathway' (KEGG_2021_Human)
'transcription' (ChEA_2016)
'ontology' (GO_Biological_Process_2021)
'diseases_drugs' (GWAS_Catalog_2019)
'celltypes' (PanglaoDB_Augmented_2021)
'kinase_interactions' (KEA_2015)

Flags
plot
Jupyter Lab / Google Colab only. plot=True provides a graphical overview of the first 15 results (default: False).

Example

gget enrichr -g ACE2 AGT AGTR1 -db ontology

# Jupyter Lab / Google Colab:
gget.enrichr(["ACE2", "AGT", "AGTR1"], database="ontology", plot=True)

→ Returns pathways/functions involving genes ACE2, AGT, and AGTR1 from the GO Biological Process 2021 database in data frame format. In Jupyter Lab / Google Colab, plot=True returns a graphical overview of the results:

More examples

gget archs4

Find the most correlated genes to a gene of interest or find the gene's tissue expression atlas using ARCHS4.
Return format: data frame.

Required arguments
-g --gene
Short name (gene symbol) of gene of interest, e.g. 'STAT4'.

Optional arguments
-w --which
'correlation' (default) or 'tissue'.
'correlation' returns a gene correlation table that contains the 100 most correlated genes to the gene of interest. The Pearson correlation is calculated over all samples and tissues in ARCHS4.
'tissue' returns a tissue expression atlas calculated from human or mouse samples (as defined by 'species') in ARCHS4.

-s --species
'human' (default) or 'mouse'.
Defines whether to use human or mouse samples from ARCHS4.
(Only for tissue expression atlas.)

Examples

gget archs4 -g ACE2

# Jupyter Lab / Google Colab:
gget.archs4("ACE2")

→ Returns the 100 most correlated genes to ACE2 in a data frame:

gene_symbol	pearson_correlation
SLC5A1	0.579634
CYP2C18	0.576577
. . .	. . .

gget archs4 -g ACE2 -w tissue

# Jupyter Lab / Google Colab:
gget.archs4("ACE2", which="tissue")

→ Returns the tissue expression of ACE2 in a data frame (by default, human data is used):

id	min	q1	median	q3	max
System.Urogenital/Reproductive System.Kidney.RENAL CORTEX	0.113644	8.274060	9.695840	10.51670	11.21970
System.Digestive System.Intestine.INTESTINAL EPITHELIAL CELL	0.113644	5.905560	9.570450	13.26470	13.83590
. . .	. . .	. . .	. . .	. . .	. . .

More examples

Installed on host via pip install --upgrade gget:

$ system_profiler SPSoftwareDataType SPHardwareDataType
Software:

    System Software Overview:

      System Version: macOS 12.5 (21G72)
      Kernel Version: Darwin 21.6.0
      Boot Volume: Macintosh HD
      Boot Mode: Normal
      Computer Name: Earl Grey
      User Name: Alex Reynolds (areynolds)
      Secure Virtual Memory: Enabled
      System Integrity Protection: Enabled
      Time since boot: 1 day 8:44

Hardware:

    Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: MacBookPro16,4
      Processor Name: 8-Core Intel Core i9
      Processor Speed: 2.4 GHz
      Number of Processors: 1
      Total Number of Cores: 8
      L2 Cache (per Core): 256 KB
      L3 Cache: 16 MB
      Hyper-Threading Technology: Enabled
      Memory: 32 GB
      System Firmware Version: 1731.140.2.0.0 (iBridge: 19.16.16064.0.0,0)
      OS Loader Version: 540.120.3~19
      Serial Number (system): C02CT0C0PT01
      Hardware UUID: C6082A3D-359C-5F2C-AC84-5068C7891897
      Provisioning UDID: C6082A3D-359C-5F2C-AC84-5068C7891897
      Activation Lock Status: Disabled

Problematic command:

$ python --version
Python 3.8.13
$ gget search -s homo_sapiens 'usf1'
Tue Aug  9 20:03:08 2022 INFO Fetching results from database: homo_sapiens_core_107_38
Tue Aug  9 20:03:11 2022 ERROR The Ensembl server returned the following error: Character set 'utf8' unsupported
Traceback (most recent call last):
  File "/Users/areynolds/miniconda3/bin/gget", line 8, in <module>
    sys.exit(main())
  File "/Users/areynolds/miniconda3/lib/python3.8/site-packages/gget/main.py", line 1223, in main
    gget_results = search(
  File "/Users/areynolds/miniconda3/lib/python3.8/site-packages/gget/gget_search.py", line 172, in search
    df_temp = pd.read_sql(query, con=db_connection)
UnboundLocalError: local variable 'db_connection' referenced before assignment

Using version 0.3.7:

$ gget --version
gget version: 0.3.7

gget alphafold "zsh: illegal hardware instruction" on M1

Hello,

I'm trying to run gget alphafold on my M1 mac, but am encountering the following error:

zsh: illegal hardware instruction

I noticed other threads (143) that comment on the difficulty of running tensorflow with m1 hardware and was wondering if this might be the issue?

I checked to see what version of tensorflow was installed with pip and found several tensorflow-related packages, but not tensorflow itself, I'm guessing this is why other workarounds don't work (i.e. installing tensorflow alpha, or what is suggested here: https://www.youtube.com/watch?v=WFIZn6titnc) :

tensorboard 2.9.1 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorflow-cpu 2.9.1 tensorflow-estimator 2.9.0 tensorflow-io-gcs-filesystem 0.26.0

Is there an easy way to resolve this?

-Alex

opened by alwhiteh 9
Keyerror: "0000:query"

i have used example sequence in the alphafold module and it works fine however when I give it a custom sequence it give the said error Keyerror: "0000:query". Please can you guide regarding the matter

opened by sharzil1994 5
Error running alphafold

Hi I am running gget version: 0.3.7. When I run alphafold prediction I get this error: gget alphafold AASEQUENCE /home/ccadmin/anaconda3/envs/gget/lib/python3.9/site-packages/haiku/_src/data_structures.py:37: FutureWarning: jax.tree_structure is deprecated, and will be removed in a future release. Use jax.tree_util.tree_structure instead. PyTreeDef = type(jax.tree_structure(None)) Fri Aug 12 20:12:30 2022 INFO Validating input sequence(s). Using the single-chain model. Fri Aug 12 20:12:30 2022 INFO Finding closest source for reference database. Jackhmmer search: 5%|██▉ | 7/147 [elapsed: 11:32 remaining: 3:50:48] Traceback (most recent call last): File "/home/ccadmin/anaconda3/envs/gget/bin/gget", line 8, in sys.exit(main()) File "/home/ccadmin/anaconda3/envs/gget/lib/python3.9/site-packages/gget/main.py", line 1439, in main alphafold( File "/home/ccadmin/anaconda3/envs/gget/lib/python3.9/site-packages/gget/gget_alphafold.py", line 467, in alphafold raw_msa_results = get_msa( File "/home/ccadmin/anaconda3/envs/gget/lib/python3.9/site-packages/gget/gget_alphafold.py", line 147, in get_msa raw_msa_results[db_name].extend(jackhmmer_runner.query(fasta_path)) File "/home/ccadmin/anaconda3/envs/gget/lib/python3.9/site-packages/alphafold/data/tools/jackhmmer.py", line 205, in query os.remove(db_local_chunk(i)) FileNotFoundError: [Errno 2] No such file or directory: '/home/ccadmin/tmp/jackhmmer/fcb45c67-8b27-4156-bbd8-9d11512babf2/uniref90_2021_03.fasta.8' Any idea how to fix this?

opened by xinyangbing 5
Fails to depict and answer the polymeric forms

i have used gget to predict the structure of chloride dismutase and it successfully gave me the pdb file of the structure in a monomeric form and when i cross checked it with pdb database it showed the structure to be a hexameric protein. Now is it necessary to fill this gap ?

opened by Harpreet525 4

Local variable 'db_connection' referenced before assignment

opened by alexpreynolds 4

Jupyter Notebook Kernel Dies When Using gget alphafold

I am able to use every gget module except for the alphaFold module. Whenever I implement a command line with AlphaFold the Jupyter Notebook kernel dies almost immediately. Is this something that occurs for others? Any recommendations are appreciated.

Generate new prediction from amino acid sequence

import gget gget.setup("alphafold") gget.alphafold("MAAHKGAEH")

opened by tmileur 3
Add Uniprot localisation data

Many thanks for this brilliant tool. I was wondering if it would be possible to add the "subcellular localisation" segment of the uniprot ID to the tools output?

This would be immensely helpful in terms of filtering for sub cellular location.

Many thanks and apologies if it does this already, but I couldn't identify this data in the output
enhancement

opened by Nusob888 3

AlphaFold model parameters download error

Hi! I am hitting a SSL cert problem when running alphafold setup:

Tue Aug 16 10:18:49 2022 INFO Downloading AlphaFold model parameters (requires 4.1 GB of storage). This might take a few minutes.
curl: (60) SSL certificate problem: unable to get local issuer certificate                                                                                
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

Where are the parameters being downloaded from? I believe this will help me check if I have the right certs and are in the right place. Any additional advice to solve this error would be greatly appreciated! Thank you!

opened by EvoEpi 3

potential issue with UniProt connection

Hello,

Thanks for the great package. I think there may be intermittent issues with the UniProt connection, I received this error today:

But as you can see, there is a UniProt entry for this gene: https://www.uniprot.org/uniprot/Q8NBP7

Oddly, grabbing amino acid sequences worked fine for me yesterday. I appreciate any tips!

opened by keoughkath 3
pdb module

I love the new alphafold feature! Could there also be a gget pdb command for fetching structures from PDB? Combined with gget blast -db pdbaa this could be very powerful for comparing predictions and templates.
enhancement

opened by sbliven 2
openmm=7.5.1 is no longer available from conda-forge.

I cannot get the Alphafold module to work, as openmm v.7.5.1 is no longer available from conda-forge. Later versions of openmm do not have the version method, causing gget to crash using later versions.

opened by ahwchemistry 2

gget alphafold: Add option to define jackhmmer save directory

    gget will currently create a "tmp" folder in your home directory ("~/tmp/jackhmmer/") for the Jackhmmer search. I think adding an option to change this path is a great idea for a future version. The temporary files will take up to ~2 GB (in case it is possible to free this space until I have implemented your request).

Originally posted by @lauraluebbert in https://github.com/pachterlab/gget/issues/43#issuecomment-1253796040

enhancement

opened by lauraluebbert 0

Error detecting openmm

Hi, as the title says, I tried to run this and installed all the dependencies. But, still, somehow it doesn't detect openmm. Can this be resolved? A screenshot is attached. Thanks.

opened by LalitNM 14
Add feature to fetch UCSC IDs

The idea would to create a feature similar to gget search for Ensembl but also for UCSC IDs.

I remember the last time I had to do something similar, in the end I had to do a request to the path below where "{ucsc_id}" would be the ID itself: "https://genome-euro.ucsc.edu/cgi-bin/hgGene?hgg_gene={ucsc_id}&db=hg19"

Links that should help: https://genome.ucsc.edu/goldenPath/help/api.html https://www.biotools.fr/human/ucsc_id_converter
enhancement

opened by Joaodemeirelles 0
Option to BLAST one protein sequence against another

Thank you for the very cool and important package! This will save me hours and hours of computational work

I was wondering if you can add an option to BLAST two protein sequences against each other and get their e-value etc. I have a list of proteins that I want to compare to each other. If you'd prefer to point me to how I can make this feature and do a pull request, I'm more than happy to do so too!
enhancement

opened by hoangthienan95 1

v0.27.0(Dec 10, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.13(Nov 11, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.12(Nov 10, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.11(Sep 7, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.10(Sep 2, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.9(Aug 25, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.8(Aug 12, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.7(Aug 9, 2022)

Source code(tar.gz)
Source code(zip)
v0.3.5(Aug 6, 2022)

Source code(tar.gz)
Source code(zip)
v0.2.7(Jul 29, 2022)

gget search bug fix
Source code(tar.gz)
Source code(zip)
v0.2.6(Jul 8, 2022)

Source code(tar.gz)
Source code(zip)
v0.2.5(Jun 30, 2022)

Source code(tar.gz)
Source code(zip)
v0.2.3(Jun 27, 2022)

gget seq argument 'transcribe' renamed to 'translate' for clarity. Backward compatibility preserved.
Source code(tar.gz)
Source code(zip)
v0.2.2(Jun 24, 2022)

Source code(tar.gz)
Source code(zip)
v0.2.1(Jun 9, 2022)

Source code(tar.gz)
Source code(zip)
v0.2.0(Jun 8, 2022)

Source code(tar.gz)
Source code(zip)
v0.1.2(Jun 7, 2022)

Source code(tar.gz)
Source code(zip)
v0.1.1(May 28, 2022)

Source code(tar.gz)
Source code(zip)
v0.1.0(May 25, 2022)

Flag [--json] returns results in json format for all gget modules with default data frame output.
Source code(tar.gz)
Source code(zip)
v0.0.24(May 17, 2022)

Source code(tar.gz)
Source code(zip)
v0.0.22(May 10, 2022)

Source code(tar.gz)
Source code(zip)

gget is a free and open-source command-line tool and Python package that enables efficient querying of genomic databases.

Related tags

Overview

gget

Installation

Quick start guide

Manual

gget ref

Examples

gget search

Example

gget info

Example

gget seq

Examples

gget blast

Example

gget blat

Example

gget muscle

Example

gget enrichr

Example

gget archs4

Examples

Comments

Generate new prediction from amino acid sequence

Releases(v0.27.0)

v0.27.0(Dec 10, 2022)

v0.3.13(Nov 11, 2022)

v0.3.12(Nov 10, 2022)

v0.3.11(Sep 7, 2022)

v0.3.10(Sep 2, 2022)

v0.3.9(Aug 25, 2022)

v0.3.8(Aug 12, 2022)

v0.3.7(Aug 9, 2022)

v0.3.5(Aug 6, 2022)

v0.2.7(Jul 29, 2022)

v0.2.6(Jul 8, 2022)

v0.2.5(Jun 30, 2022)

v0.2.3(Jun 27, 2022)

v0.2.2(Jun 24, 2022)

v0.2.1(Jun 9, 2022)

v0.2.0(Jun 8, 2022)

v0.1.2(Jun 7, 2022)

v0.1.1(May 28, 2022)

v0.1.0(May 25, 2022)

v0.0.24(May 17, 2022)

v0.0.22(May 10, 2022)

Owner

Pachter Lab

A python-based terminal application that displays current cryptocurrency prices

Python3 library for multimedia functions at the command terminal

Task-manager-CLI with Priority Modification

Command-line parsing library for Python 3.

term2048 is a terminal-based version of 2048.

Microsoft Azure CLI - Azure Command-Line Interface

adds flavor of interactive filtering to the traditional pipe concept of UNIX shell

Freaky fast fuzzy Denite/CtrlP matcher for vim/neovim

Python Library and CLI for exporting MySQL databases

organize your books on the command line

Fylm is a wonderful automated command line app for organizing your film media.

YouCompleteMe: a code-completion engine for Vim

Trans is a dependency-free CLI for Google Translate

Magnificent app which corrects your previous console command.

A lightweight terminal-based password manager coded with Python using SQLCipher for SQLite database encryption.

Python commandline tool for remembering linux/terminal commands

The WalletsNet CLI helps you connect to WalletsNet

Output Analyzer for you terminal commands

🖥️ A cross-platform modern shell.

An anime command-line system information tool written in python.