An open collection of annotated voices in Japanese language

Last update: Dec 14, 2022

Related tags

Text Data & NLP koniwa

Overview

声庭 (Koniwa): オープンな日本語音声とアノテーションのコレクション

Koniwa (声庭): An open collection of annotated voices in Japanese language

概要

Koniwa(声庭)は利用・修正・再配布が自由でオープンな音声とアノテーションのコレクションです．
（商用目的での利用も可能です．）

アノテーション作業は始まったばかりです．皆様のコントリビューションをお待ちしております．

ファイルリンク

sound: 音声データ (Google Drive)
source: 参考データ (Google Drive): 原文などアノテーション時の参考になる資料
data: 書誌情報・アノテーションデータ

シリーズ

本コレクションは現在以下のオープンな音声データを利用しています．公開に関わってくださった皆様に深く感謝いたします．

amagasaki: CC BY 4.0
- 2011年4月〜2015年11月
- 兵庫県尼崎市のラジオ番組 (FMあまがさき)
  - いなむら市長の「ひと咲きまち咲きあまがさき」
  - いなむら市長の「い～なこの街あまがさき」 (2014年11月より改題)
free_culture_2012: CC BY 3.0
- 2012年8月
- J-WAVEのラジオ番組 J-WAVE 360° Forum 〜Seek and Find〜
higashiyodogawa: CC BY 4.0
- 2017年11月〜2021年7月
- 大阪市東淀川区の「広報ひがしよどがわ」音声版
librivox: パブリックドメイン
- LibriVox.orgの収録作品
- 歌など一部のものは除外している
minato: CC BY 4.0
- 2019年5月〜2020年12月
- 大阪市港区の「広報みなと」音声版
nishiyodogawa: CC BY 4.0
- 2018年8月〜2021年7月
- 大阪市西淀川区の『広報紙「きらり☆にしよど」音声版』
roudoku_toshokan: CC BY 2.1 JP (原文はパブリックドメイン)
- 池田英生氏の朗読図書館配信の朗読音声
tnc: CC BY 3.0 (原文はパブリックドメイン)
- テレビ西日本のアナウンサーによる朗読音声

Licence

原文・音声のライセンス

本コレクション内の音声は以下のいずれかでライセンスされているもののみを含めることにしています．

パブリックドメイン
- PDM
- CC0
クリエイティブ・コモンズ
- CC BY

アノテーションや文書のライセンス

以下は全てCC0 1.0でライセンスします

二次的著作物に該当するアノテーションのうち二次的著作部分
アノテーションのコメント・アノテーションマニュアルなどの本レポジトリ内の一次著作物（プログラムを除く）

プログラムのライセンス

プログラムはApache License 2.0でライセンスします．

Maintainer

shirayu

An open collection of annotated voices in Japanese language

Related tags

Overview

声庭 (Koniwa): オープンな日本語音声とアノテーションのコレクション

概要

ファイルリンク

シリーズ

Licence

原文・音声のライセンス

アノテーションや文書のライセンス

プログラムのライセンス

Maintainer

Owner

Koniwa project

Simple Annotated implementation of GPT-NeoX in PyTorch

Silero Models: pre-trained speech-to-text, text-to-speech models and benchmarks made embarrassingly simple

Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Klexikon: A German Dataset for Joint Summarization and Simplification

SGMC: Spectral Graph Matrix Completion

Contact Extraction with Question Answering.

Wikipedia-Utils: Preprocessing Wikipedia Texts for NLP

An attempt to map the areas with active conflict in Ukraine using open source twitter data.

Train and use generative text models in a few lines of code.

Journey is a NLP-Powered Developer assistant

A paper list of pre-trained language models (PLMs).

Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

Fast, DB Backed pretrained word embeddings for natural language processing.

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Text Analysis & Topic Extraction on Android App user reviews

A Python module made to simplify the usage of Text To Speech and Speech Recognition.

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

Learning Spatio-Temporal Transformer for Visual Tracking

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)