I³ Tracker for Essential Open Innovation Datasets

Overview

I³ Tracker for Essential Open Innovation Datasets

This repository is set up to track, version, and contribute updates to the I³ Essential Open Innovation Dataset Index, which consists of lists of datasets and tools relevant to Innovation Data. This index may be collaboratively edited, either by making edits to markdown files contained in this repository, or editing metadata in the Google Sheet.

The repository checks the Google Sheet for changes every 5min (and will update the site if there are any), and will also re-build the site automatically when somebody makes an edit via git. The site is generated from markdown files in this repository using the static site generator Jekyll.

Add/edit a Dataset using Git

Each record in the index has a corresponding markdown file (auto-generated) in the folder datasets/. These files contain the basic metadata associated with the record in the frontmatter, and also allow more long-form information, such as details of queries, images, and other written information, to be added. Both of these things are editable.

When a markdown file is added to the datasets/ folder, a GitHub action publishes the metadata in the frontmatter to the Google Sheet, and to the archive csv, to keep the records up to date. This script calls various metadata scrapers to automatically pull information like permalinks, citation information, and versioning. Once the file has been successfully committed, a second action will run to refresh the state of the website to reflect the edits.

Contribution Steps

  1. fork the repository, create a markdown in the folder 'datasets' based on the template file
  2. add as much metadata as you like, and create a pull request in this repository
  3. all being well, this should automatically merge. if not, you can check the GitHub actions log, or open an issue. (make sure it's in the correct folder, and has a .md file extension before doing so)

If the dataset is hosted on a platform with parseable citation metadata (Dataverse, Zenodo, ICPSR, and major university repositories are examples of these), then the tool will automatically pull most of the data associated with the dataset -- fields that will auto-fill are indicated by a comment. If the dataset is hosted on e.g. a personal site, then you might want to include some more information -- but ultimately, only a title and URL is really necessary. However you fill out your dataset, a uuid and timestamp will be generated for it automatically; these aren't fields you need to include (hence not included in the template).

The reason we've done this is to save you from copy-pasting a lot of information from existing repositories, and to make it easier for you to curate more useful and harder-to-scrape metadata -- such as the timeframes of datasets, links to code and documentation, and datasets that might be built on top of it but don't use an easy-to-parse citation. So definitely prioritise these fields!

If you're unsure of how to make a pull request, github has some good guides to doing this. You can also just make an edit to the Google Sheet, which will have an equivalent effect.

If there's a piece of metadata you think we should collect but don't, please add it to the frontmatter of the markdown files you contribute. (Nothing will break!) Then open an issue mentioning the new field, so that we can discuss adding it to the repository officially too.

To contribute a new dataset via pull request, please use the template file datasets/_template.md as a reference:

---
title: #required
url: #required
doi: #scrapeable
citation: #scrapeable
description: #scrapeable
timeframe:
documentation:
error_metrics:
code:
versioning: #scrapeable
terms_of_use: #scrapeable
tags:
references:
---


body text. info about `queries`, links and images goes here :)

Collections

The site also indexes collections, which are pages containing thematic information about datasets, tools and resources. These are housed in the folder collections/. The collection intro.md is an example -- this particular collection is also rendered on the front page of the site.

In the same manner as datasets, collection files can be added or edited using pull requests, where the repository is forked, and additions or edits to the collections can be made. The collections are not currently tracked via Google Sheets, and so may only be edited via git.

To create a new collection, the collection template may be copied to use as a reference:

---
title:
author:
tags:
---

Collections are a way to list resources around a theme, relevant to a research agenda or set of papers, or as an introduction to various aspects of the field. They are formatted in markdown:

To list a dataset that's in the index, use a relative link, e.g.

```markdown
[local dataset name](/datasets/dataset_shortname)

Dataset shortnames can be found either by looking at the urls directly, or through the 'shortnames' column of the Google Sheet.

Index

A versioned .csv file containing the index may be accessed in the folder index_archive. If you'd like to browse and query either sheet, you can do so using Github's Flat Data tool here. The Github Action that pulls the sheet is based on Dolthub's Gsheets-to-csv action.

Screenshot 2021-07-13 at 13 35 49

Free and open source qualitative research tool

Taguette A spin on the phrase "tag it!", Taguette is a free and open source qualitative research tool that allows users to: Import PDFs, Word Docs (.d

Remi Rampin 48 Jan 02, 2023
A StarkNet project template based on a Pythonic environment

StarkNet Project Template This is an opinionated StarkNet project template. It is based around the Python's ecosystem and best practices. tox to manag

Francesco Ceccon 5 Apr 21, 2022
A Python 3 client for the beanstalkd work queue

Greenstalk Greenstalk is a small and unopinionated Python client library for communicating with the beanstalkd work queue. The API provided mostly map

Justin Mayhew 67 Dec 08, 2022
Type Persian without confusing words for yourself and others, in Adobe Connect

About In the Adobe Connect chat section, to type in Persian or Arabic, the written words will be confused and will be written and sent illegibly (This

Matin Najafi 23 Nov 26, 2021
万能通用对象池,可以池化任意自定义类型的对象。

pip install universal_object_pool 此包能够将一切任意类型的python对象池化,是万能池,适用范围远大于单一用途的mysql连接池 http连接池等。 框架使用对象池包,自带实现了4个对象池。可以直接开箱用这四个对象池,也可以作为例子学习对象池用法。

12 Dec 15, 2022
VacationCycleLogicBackEnd - Vacation Cycle Logic BackEnd With Python

Vacation Cycle Logic BackEnd Getting Started Existing virtualenv If your project

Mohamed Gamal 0 Jan 03, 2022
SkyPort console user terminal written in python

SkyPort terminal implemented as a console script written in Python Description Sky Port is an universal bus between user software and compute resource

Sky Workflows 1 Oct 23, 2022
A repository for all ZenML projects that are specific production use-cases.

ZenFiles Original Image source: https://www.goodfon.com/wallpaper/x-files-sekretnye-materialy.html And naturally, all credits to the awesome X-Files s

ZenML 66 Jan 06, 2023
Solves Maths24 problems for you!

maths24-solver Solves Maths24 problems for you! Enjoy this open scource project! You can edit modify and share! My wishes is for you to use this proje

6 Nov 07, 2021
AndroidEnv is a Python library that exposes an Android device as a Reinforcement Learning (RL) environment.

AndroidEnv is a Python library that exposes an Android device as a Reinforcement Learning (RL) environment.

DeepMind 814 Dec 26, 2022
LeetComp - Background tasks powering the static content at LeetComp

LeetComp Analysing compensations mentioned on the Leetcode forums (https://kuuts

Kumar Utsav 125 Dec 21, 2022
Hera is a Python framework for constructing and submitting Argo Workflows.

Hera is an Argo Workflows Python SDK. Hera aims to make workflow construction and submission easy and accessible to everyone! Hera abstracts away workflow setup details while still maintaining a cons

argoproj-labs 241 Jan 02, 2023
The Python Fuzzer that the world deserves 🐍

pip3 install frelatage Current release : 0.0.2 The Python Fuzzer that the world deserves Installation | How it works | Features | Use Frelatage | Conf

Rog3r 219 Dec 21, 2022
Terrible python code from the "bubble that breaks maths" video.

Terrible python code from the "bubble that breaks maths" video.

Stand-up Maths 12 Oct 25, 2022
Launcher program to select which version of the Q-Sys software to launch.

QSC-QSYS Launcher Launcher program to select which version of the Q-Sys software to launch. Instructions To use the application simply save the "Q-Sys

Zach Lisko 2 Sep 28, 2022
Standard mutable string (character array) implementation for Python.

chararray A standard mutable character array implementation for Python.

Tushar Sadhwani 3 Dec 18, 2021
Scripts to integrate DFIR-IRIS, MISP and TimeSketch

Scripts to integrate DFIR-IRIS, MISP and TimeSketch

Koen Van Impe 20 Dec 16, 2022
Jarvis Python BOT acts like Google-assistance

Jarvis-Python-BOT Jarvis Python BOT acts like Google-assistance Setup Add Mail ID (Gmail) in the file at line no 82.

Ishan Jogalekar 1 Jan 08, 2022
Odoo modules related to website/webshop

Website Apps related to Odoo it's website/webshop features: webshop_public_prices: allow configuring to hide or show product prices and add to cart bu

Yenthe Van Ginneken 9 Nov 04, 2022
Taxonomy addition for complete trees

TACT: Taxonomic Addition for Complete Trees TACT is a Python app for stochastic polytomy resolution. It uses birth-death-sampling estimators across an

Jonathan Chang 3 Jun 07, 2022