Python reader for Linked Data in HDF5 files

Related tags

Data Analysish5ld
Overview

h5ld: HDF5 Linked Data

Linked Data are becoming more popular for user-created metadata in HDF5 files. This Python package provides readers for the HDF5-based formats with such metadata . Entire linked data content is read in one operation and made available as an rdflib graph object.

Currently supported:

Installation

pip install git+https://github.com/HDFGroup/h5ld@{LABEL}

where {LABEL} is either master or a tag label.

Requirements:

  • Python >= 3.7
  • h5py >= 3.3.0
  • rdflib >= 5.0.0

License

This software is open source. See this file for details.

Quick Start

This package can be used either as a command-line tool or programmatically. On the command-line, the package dumps the link data of an input HDF5 file into several popular RDF formats supported by the rdflib package. For example:

python -m h5ld -f json-ld -o output.json INPUT.h5

will dump the input file's RDF data to a file output.json in the JSON-LD format. Omitting an output file prints out the same content so it can be ingested by another command-line tool. Full description is available from:

python -m h5ld --help

There is also a programmatic interface for integration into Python applications. Each h5ld reader will provide the following methods and attributes:

  • File format name.

    print(f"Input file format is: {reader.name}")
  • Short (usually an acronym) of the file format.

    print(f"File format acronym: {reader.short_name}")
  • Check if the reader is the right choice for the input file.

    with h5py.File("input.h5", mode="r") as f:
        if reader.verify_format(f):
            # Do something...
          else:
              print("Sorry but not the right h5ld reader.")
  • Check if there is linked data content in the input HDF5 file. Optionally, print an appropriate description of the data.

    with h5py.File("input.h5", mode="r") as f:
        reader.check_ld(f, report=True)
  • Read linked data and export it to a destination in the requested RDF format.

    with h5py.File("input.h5", mode="r") as f:
        reader(f).dump_ld("output.json", format="json-ld")
  • Read linked data and return either an rdflib.Graph or rdflib.ConjunctiveGraph object.

    with h5py.File("input.h5", mode="r") as f:
        graph = reader(f).get_ld()
  • A Python dictionary with the reader's namespace prefixes and their IRIs.

    with h5py.File("input.h5", mode="r") as f:
        rdr = reader(f)
        namespaces = rdr.namespaces
Owner
The HDF Group
Tools and technologies to support the Hierarchical Data Format (HDF)
The HDF Group
Analysis scripts for QG equations

qg-edgeofchaos Analysis scripts for QG equations FIle/Folder Structure eigensolvers.py - Spectral and finite-difference solvers for Rossby wave eigenf

Norman Cao 2 Sep 27, 2022
A fast, flexible, and performant feature selection package for python.

linselect A fast, flexible, and performant feature selection package for python. Package in a nutshell It's built on stepwise linear regression When p

88 Dec 06, 2022
SparseLasso: Sparse Solutions for the Lasso

SparseLasso: Sparse Solutions for the Lasso Introduction SparseLasso provides a Scikit-Learn based estimation of the Lasso with cross-validation tunin

Gabriel Okasa 1 Nov 08, 2021
Get mutations in cluster by querying from LAPIS API

Cluster Mutation Script Get mutations appearing within user-defined clusters. Usage Clusters are defined in the clusters dict in main.py: clusters = {

neherlab 1 Oct 22, 2021
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather

Tuplex 791 Jan 04, 2023
Vectorizers for a range of different data types

Vectorizers for a range of different data types

Tutte Institute for Mathematics and Computing 69 Dec 29, 2022
Python data processing, analysis, visualization, and data operations

Python This is a Python data processing, analysis, visualization and data operations of the source code warehouse, book ISBN: 9787115527592 Descriptio

FangWei 1 Jan 16, 2022
Project under the certification "Data Analysis with Python" on FreeCodeCamp

Sea Level Predictor Assignment You will anaylize a dataset of the global average sea level change since 1880. You will use the data to predict the sea

Bhavya Gopal 3 Jan 31, 2022
A collection of robust and fast processing tools for parsing and analyzing web archive data.

ChatNoir Resiliparse A collection of robust and fast processing tools for parsing and analyzing web archive data. Resiliparse is part of the ChatNoir

ChatNoir 24 Nov 29, 2022
This is a python script to navigate and extract the FSD50K dataset

FSD50K navigator This is a script I use to navigate the sound dataset from FSK50K.

sweemeng 2 Nov 23, 2021
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Amundsen 3.7k Jan 03, 2023
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

About The ROOT system provides a set of OO frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficien

ROOT 2k Dec 29, 2022
TE-dependent analysis (tedana) is a Python library for denoising multi-echo functional magnetic resonance imaging (fMRI) data

tedana: TE Dependent ANAlysis TE-dependent analysis (tedana) is a Python library for denoising multi-echo functional magnetic resonance imaging (fMRI)

136 Dec 22, 2022
Python tools for querying and manipulating BIDS datasets.

PyBIDS is a Python library to centralize interactions with datasets conforming BIDS (Brain Imaging Data Structure) format.

Brain Imaging Data Structure 180 Dec 18, 2022
Pandas and Dask test helper methods with beautiful error messages.

beavis Pandas and Dask test helper methods with beautiful error messages. test helpers These test helper methods are meant to be used in test suites.

Matthew Powers 18 Nov 28, 2022
Program that predicts the NBA mvp based on data from previous years.

NBA MVP Predictor A machine learning model using RandomForest Regression that predicts NBA MVP's using player data. Explore the docs » View Demo · Rep

Muhammad Rabee 1 Jan 21, 2022
A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.

Realtime Financial Market Data Visualization and Analysis Introduction This repo shows my project about real-time stock data pipeline. All the code is

6 Sep 07, 2022
Basis Set Format Converter

Basis Set Format Converter Repository for the online tool that allows you to enter a basis set in the form of text input for a variety of Quantum Chem

Manas Sharma 3 Jun 27, 2022
Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles

Correlation-Study-Climate-Change-EV-Adoption Data Analytics: Modeling and Studying data relating to climate change and adoption of electric vehicles I

Jonathan Feng 1 Jan 03, 2022
A CLI tool to reduce the friction between data scientists by reducing git conflicts removing notebook metadata and gracefully resolving git conflicts.

databooks is a package for reducing the friction data scientists while using Jupyter notebooks, by reducing the number of git conflicts between different notebooks and assisting in the resolution of

dataroots 86 Dec 25, 2022