I³ Tracker for Essential Open Innovation Datasets

Overview

I³ Tracker for Essential Open Innovation Datasets

This repository is set up to track, version, and contribute updates to the I³ Essential Open Innovation Dataset Index, which consists of lists of datasets and tools relevant to Innovation Data. This index may be collaboratively edited, either by making edits to markdown files contained in this repository, or editing metadata in the Google Sheet.

The repository checks the Google Sheet for changes every 5min (and will update the site if there are any), and will also re-build the site automatically when somebody makes an edit via git. The site is generated from markdown files in this repository using the static site generator Jekyll.

Add/edit a Dataset using Git

Each record in the index has a corresponding markdown file (auto-generated) in the folder datasets/. These files contain the basic metadata associated with the record in the frontmatter, and also allow more long-form information, such as details of queries, images, and other written information, to be added. Both of these things are editable.

When a markdown file is added to the datasets/ folder, a GitHub action publishes the metadata in the frontmatter to the Google Sheet, and to the archive csv, to keep the records up to date. This script calls various metadata scrapers to automatically pull information like permalinks, citation information, and versioning. Once the file has been successfully committed, a second action will run to refresh the state of the website to reflect the edits.

Contribution Steps

  1. fork the repository, create a markdown in the folder 'datasets' based on the template file
  2. add as much metadata as you like, and create a pull request in this repository
  3. all being well, this should automatically merge. if not, you can check the GitHub actions log, or open an issue. (make sure it's in the correct folder, and has a .md file extension before doing so)

If the dataset is hosted on a platform with parseable citation metadata (Dataverse, Zenodo, ICPSR, and major university repositories are examples of these), then the tool will automatically pull most of the data associated with the dataset -- fields that will auto-fill are indicated by a comment. If the dataset is hosted on e.g. a personal site, then you might want to include some more information -- but ultimately, only a title and URL is really necessary. However you fill out your dataset, a uuid and timestamp will be generated for it automatically; these aren't fields you need to include (hence not included in the template).

The reason we've done this is to save you from copy-pasting a lot of information from existing repositories, and to make it easier for you to curate more useful and harder-to-scrape metadata -- such as the timeframes of datasets, links to code and documentation, and datasets that might be built on top of it but don't use an easy-to-parse citation. So definitely prioritise these fields!

If you're unsure of how to make a pull request, github has some good guides to doing this. You can also just make an edit to the Google Sheet, which will have an equivalent effect.

If there's a piece of metadata you think we should collect but don't, please add it to the frontmatter of the markdown files you contribute. (Nothing will break!) Then open an issue mentioning the new field, so that we can discuss adding it to the repository officially too.

To contribute a new dataset via pull request, please use the template file datasets/_template.md as a reference:

---
title: #required
url: #required
doi: #scrapeable
citation: #scrapeable
description: #scrapeable
timeframe:
documentation:
error_metrics:
code:
versioning: #scrapeable
terms_of_use: #scrapeable
tags:
references:
---


body text. info about `queries`, links and images goes here :)

Collections

The site also indexes collections, which are pages containing thematic information about datasets, tools and resources. These are housed in the folder collections/. The collection intro.md is an example -- this particular collection is also rendered on the front page of the site.

In the same manner as datasets, collection files can be added or edited using pull requests, where the repository is forked, and additions or edits to the collections can be made. The collections are not currently tracked via Google Sheets, and so may only be edited via git.

To create a new collection, the collection template may be copied to use as a reference:

---
title:
author:
tags:
---

Collections are a way to list resources around a theme, relevant to a research agenda or set of papers, or as an introduction to various aspects of the field. They are formatted in markdown:

To list a dataset that's in the index, use a relative link, e.g.

```markdown
[local dataset name](/datasets/dataset_shortname)

Dataset shortnames can be found either by looking at the urls directly, or through the 'shortnames' column of the Google Sheet.

Index

A versioned .csv file containing the index may be accessed in the folder index_archive. If you'd like to browse and query either sheet, you can do so using Github's Flat Data tool here. The Github Action that pulls the sheet is based on Dolthub's Gsheets-to-csv action.

Screenshot 2021-07-13 at 13 35 49

Tool to generate wrappers for Linux libraries allowing for dlopen()ing them without writing any boilerplate

Dynload wrapper This program will generate a wrapper to make it easy to dlopen() shared objects on Linux without writing a ton of boilerplate code. Th

Hein-Pieter van Braam 25 Oct 24, 2022
A set of scripts for a two-step procedure to measure the value of access to destinations across several modes of travel within a geographic area.

A set of scripts for a two-step procedure to measure the value of access to destinations across several modes of travel within a geographic area.

Institute for Transportation and Development Policy 2 Oct 16, 2022
Fetch PRs from GitHub and analyze which ones are unmergeable

Set up token Generate a personal access token on GitHub. Add repo permissions. export GH_TOKEN="abcdefg" Pull PR data make Usually, GitHub doesn't h

Stefan van der Walt 1 Nov 05, 2021
News-app - This is a news web app for reading news from different sources and topics

News-app - This is a news web app for reading news from different sources and topics

1 Feb 02, 2022
A simple calculator made with tkinter.

Simple Calculator A simple calculator made with tkinter. Requirements None, only you need to have windows 😉 ...Enjoy! Installation Clone this reposit

Abhyush 2 Jan 11, 2022
flake8 plugin which checks that there is no use of sleep in the code.

flake8-sleep flake8 plugin which checks for use of sleep function. installation Using Pypi: pip install flake8-sleep flake8 codes Code Description SLP

1 Nov 26, 2021
IPO Checker for NEPSE

IPO Checker Checks more than one account for an IPO. Usage: ipo_checker.py [-h] --file FILE IPO Checker for a list. optional arguments: -h, --help

Sagar Tamang 4 Sep 20, 2022
Use Fofa、shodan、zoomeye、360quake to collect information(e.g:domain,IP,CMS,OS)同时调用Fofa、shodan、zoomeye、360quake四个网络空间测绘API完成红队信息收集

Cyberspace Map API English/中文 Development fofaAPI Completed zoomeyeAPI shodanAPI regular 360 quakeAPI Completed Difficulty APIs uses different inputs

Xc1Ym 61 Oct 08, 2022
A compiler for ARM, X86, MSP430, xtensa and more implemented in pure Python

Introduction The PPCI (Pure Python Compiler Infrastructure) project is a compiler written entirely in the Python programming language. It contains fro

Windel Bouwman 277 Dec 26, 2022
It is a Blender Tool which can convert the Object Data Attributes in face corner to the UVs or Vertex Color.

Blender_ObjectDataAttributesConvertTool It is a Blender Tool which can convert the Object Data Attributes in face corner to the UVs or Vertex Color. D

Takeshi Chō 2 Jan 08, 2022
Aero is an open source airplane intelligence tool. Aero supports more than 13,000 airlines and 250 countries. Any flight worldwide at your fingertips.

Aero Aero supports more than 13,000 airlines and 250 countries. Any flight worldwide at your fingertips. Features Main : Flight lookup Aircraft lookup

Vickey 비키 4 Oct 27, 2021
A simple tool made in Python language

Simple tool Uma simples ferramenta feita 100% em linguagem Python 💻 Requisitos: Python3 instalado em seu dispositivo Clonagem e acesso 📳 git clone h

josh washington 4 Dec 07, 2021
Mata kuliah Bahasa Pemrograman

praktikum2 MENGHITUNG LUAS DAN KELILING LINGKARAN FLOWCHART : OUTPUT PROGRAM : PENJELASAN : Tetapkan nilai pada variabel sesuai inputan dari user :

2 Nov 09, 2021
A Python wrapper API for operating and working with the Neo4j Graph Data Science (GDS) library

gdsclient NOTE: This is a work in progress and many GDS features are known to be missing or not working properly. This repo hosts the sources for gdsc

Neo4j 100 Dec 20, 2022
SECRET SANTA / KRIS KINGLE

SECRET SANTA / KRIS KINGLE Note: Before executing the script, make sure to turn

DEV_FINWIZ 10 Dec 06, 2022
Amazon SageMaker Delta Sharing Examples

This repository contains examples and related resources showing you how to preprocess, train, and serve your models using Amazon SageMaker with data fetched from Delta Lake.

Eitan Sela 5 May 02, 2022
Using graph_nets for pion classification and energy regression. Contributions from LLNL and LBNL

nbdev template Use this template to more easily create your nbdev project. If you are using an older version of this template, and want to upgrade to

3 Nov 23, 2022
Nextstrain build targeted to Omicron

About This repository analyzes viral genomes using Nextstrain to understand how SARS-CoV-2, the virus that is responsible for the COVID-19 pandemic, e

Bedford Lab 9 May 25, 2022
Life Dynamics for python

Daphny_counter run command must be like this: /usr/bin/python3 /home/nmakagonov/Daphny/daphny_counter/Daphny_counter.py -o /home/nmakagonov/Daphny/out

12 Sep 05, 2022
Mail Me My Social Media stats (SoMeMailMe)

Mail Me My Social Media follower count (SoMeMailMe) TikTok only show data 60 days back in time. With this repo you can easily scrape your follower cou

Daniel Wigh 1 Jan 07, 2022