Python reader for Linked Data in HDF5 files

Last update: May 17, 2022

Related tags

Overview

`h5ld`: HDF5 Linked Data

Linked Data are becoming more popular for user-created metadata in HDF5 files. This Python package provides readers for the HDF5-based formats with such metadata . Entire linked data content is read in one operation and made available as an rdflib graph object.

Currently supported:

Allotrope Data Format (ADF)

Installation

pip install git+https://github.com/HDFGroup/h5ld@{LABEL}

where {LABEL} is either master or a tag label.

Requirements:

Python >= 3.7
h5py >= 3.3.0
rdflib >= 5.0.0

License

This software is open source. See this file for details.

Quick Start

This package can be used either as a command-line tool or programmatically. On the command-line, the package dumps the link data of an input HDF5 file into several popular RDF formats supported by the rdflib package. For example:

python -m h5ld -f json-ld -o output.json INPUT.h5

will dump the input file's RDF data to a file output.json in the JSON-LD format. Omitting an output file prints out the same content so it can be ingested by another command-line tool. Full description is available from:

python -m h5ld --help

There is also a programmatic interface for integration into Python applications. Each h5ld reader will provide the following methods and attributes:

File format name.

print(f"Input file format is: {reader.name}")

Short (usually an acronym) of the file format.

print(f"File format acronym: {reader.short_name}")

Check if the reader is the right choice for the input file.

with h5py.File("input.h5", mode="r") as f:
    if reader.verify_format(f):
        # Do something...
      else:
          print("Sorry but not the right h5ld reader.")

Check if there is linked data content in the input HDF5 file. Optionally, print an appropriate description of the data.
```
with h5py.File("input.h5", mode="r") as f:
    reader.check_ld(f, report=True)
```

Read linked data and export it to a destination in the requested RDF format.

with h5py.File("input.h5", mode="r") as f:
    reader(f).dump_ld("output.json", format="json-ld")

Read linked data and return either an rdflib.Graph or rdflib.ConjunctiveGraph object.

with h5py.File("input.h5", mode="r") as f:
    graph = reader(f).get_ld()

A Python dictionary with the reader's namespace prefixes and their IRIs.

with h5py.File("input.h5", mode="r") as f:
    rdr = reader(f)
    namespaces = rdr.namespaces

Python reader for Linked Data in HDF5 files

Related tags

Overview

`h5ld`: HDF5 Linked Data

Installation

License

Quick Start

Owner

The HDF Group

Performance analysis of predictive (alpha) stock factors

This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

Full automated data pipeline using docker images

Implementation in Python of the reliability measures such as Omega.

Python utility to extract differences between two pandas dataframes.

Building house price data pipelines with Apache Beam and Spark on GCP

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

Py-price-monitoring - A Python price monitor

Data and code accompanying the paper Politics and Virality in the Time of Twitter

A Python and R autograding solution

Big Data & Cloud Computing for Oceanography

This cosmetics generator allows you to generate the new Fortnite cosmetics, Search pak and search cosmetics!

An extension to pandas dataframes describe function.

Data pipelines built with polars

This is a python script to navigate and extract the FSD50K dataset

Full ELT process on GCP environment.

My solution to the book A Collection of Data Science Take-Home Challenges

Pip install minimal-pandas-api-for-polars

Analysis of a dataset of 10000 passwords to find common trends and mistakes people generally make while setting up a password.

The Dash Enterprise App Gallery "Oil & Gas Wells" example

Python reader for Linked Data in HDF5 files

Related tags

Overview

h5ld: HDF5 Linked Data

Installation

License

Quick Start

Owner

The HDF Group

Performance analysis of predictive (alpha) stock factors

This repo contains a simple but effective tool made using python which can be used for quality control in statistical approach.

Full automated data pipeline using docker images

Implementation in Python of the reliability measures such as Omega.

Python utility to extract differences between two pandas dataframes.

Building house price data pipelines with Apache Beam and Spark on GCP

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

Py-price-monitoring - A Python price monitor

Data and code accompanying the paper Politics and Virality in the Time of Twitter

A Python and R autograding solution

Big Data & Cloud Computing for Oceanography

This cosmetics generator allows you to generate the new Fortnite cosmetics, Search pak and search cosmetics!

An extension to pandas dataframes describe function.

Data pipelines built with polars

This is a python script to navigate and extract the FSD50K dataset

Full ELT process on GCP environment.

My solution to the book A Collection of Data Science Take-Home Challenges

Pip install minimal-pandas-api-for-polars

Analysis of a dataset of 10000 passwords to find common trends and mistakes people generally make while setting up a password.

The Dash Enterprise App Gallery "Oil & Gas Wells" example

`h5ld`: HDF5 Linked Data