Crowd sourced training data for Rasa NLU models

Last update: Dec 26, 2022

Related tags

Text Data & NLP NLU-training-data

Overview

NLU Training Data

Crowd-sourced training data for the development and testing of Rasa NLU models.

If you're interested in grabbing some data feel free to check out our live data fetching ui.

About this repository

This is an experiment with the goal of providing basic training data for developing chatbots, therefore, this repository is open for contributions!

We need your help to create an open source dataset to empower chatbot makers and conversational AI enthusiasts alike, and we very much appreciate your support in expanding the collection of data available to the community.

How do I donate my training data?

Each folder should contain a list of multiple intents, consider if the set of training data you're contributing could fit within an existing folder before creating a new one.

To contribute via pull request, follow these steps:

Create an issue describing the training data you would like to contribute.
Create a new file with a folder title and a NLU.yml file, or contribute to an existing folder.
In the NLU.yml file, format your training data using YAML, remove all entities (see script), title each section with the intent types and add a short description e.g.intent:inform_rain 
Update the README.md file, include a list of the intent types added.
Create a pull request describing your changes.

Your pull request will be reviewed by a maintainer, who will get back to you about any necessary changes or questions. You will also be asked to sign a Contributor License Agreement.

FAQs

How should I label my intents?

Please always put the domain at the end of each intent. For example: ask_transport

What do I do about multi-intent utterences?

If you would like to contribute multi-intent utterences, please add a + to indicate an additional intent, for example: affirm+ask_transport

What about training data that’s not in English?

Currently, we are unable to evaluate the quality of all language contributions, and therefore, during the initial phase we can only accept English training data to the repository. However, we understand that the Rasa community is a global one, and in the long-term we would like to find a solution for this in collaboration with the community.

Why do I need to remove entities from my training data?

We would like to make the training data as easy as possible to adopt to new training models and annotating entities highly dependent on your bot’s purpose. Therefore, we will first focus on collecting training data that only includes intents.

To help you remove the annotated entities from your training data, you can run this script.

About Rasa

What does Rasa do? 🤔 Check out our Website
I'm new to Rasa 😄 Get Started with Rasa
I'd like to read the detailed docs 🤓 Read The Docs
I'm ready to install Rasa 🚀 Installation
I want to learn how to use Rasa 🚀 Tutorial
I have a question ❓ Rasa Community Forum
I would like to contribute 🤗 How to Contribute

Crowd sourced training data for Rasa NLU models

Related tags

Overview

NLU Training Data

About this repository

How do I donate my training data?

To contribute via pull request, follow these steps:

FAQs

How should I label my intents?

What do I do about multi-intent utterences?

What about training data that’s not in English?

Why do I need to remove entities from my training data?

About Rasa

Owner

Rasa

STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs

Multiple implementations for abstractive text summurization , using google colab

☀️ Measuring the accuracy of BBC weather forecasts in Honolulu, USA

This is a really simple text-to-speech app made with python and tkinter.

PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

Python library for interactive topic model visualization. Port of the R LDAvis package.

Precision Medicine Knowledge Graph (PrimeKG)

Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

Shellcode antivirus evasion framework

Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

PyTorch implementation of "data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language" from Meta AI

A paper list of pre-trained language models (PLMs).

Python interface for converting Penn Treebank trees to Stanford Dependencies and Universal Depenencies

The Classical Language Toolkit

KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark.

Code for paper: An Effective, Robust and Fairness-awareHate Speech Detection Framework