Crowd sourced training data for Rasa NLU models

Overview

Open in Streamlit

NLU Training Data

Crowd-sourced training data for the development and testing of Rasa NLU models.

If you're interested in grabbing some data feel free to check out our live data fetching ui.


About this repository

This is an experiment with the goal of providing basic training data for developing chatbots, therefore, this repository is open for contributions!

We need your help to create an open source dataset to empower chatbot makers and conversational AI enthusiasts alike, and we very much appreciate your support in expanding the collection of data available to the community.

How do I donate my training data?

Each folder should contain a list of multiple intents, consider if the set of training data you're contributing could fit within an existing folder before creating a new one.

To contribute via pull request, follow these steps:

  1. Create an issue describing the training data you would like to contribute.

  2. Create a new file with a folder title and a NLU.yml file, or contribute to an existing folder.

  3. In the NLU.yml file, format your training data using YAML, remove all entities (see script), title each section with the intent types and add a short description e.g.intent:inform_rain <!--The user says that it is currently raining somewhere.-->

  4. Update the README.md file, include a list of the intent types added.

  5. Create a pull request describing your changes.

Your pull request will be reviewed by a maintainer, who will get back to you about any necessary changes or questions. You will also be asked to sign a Contributor License Agreement.

FAQs

How should I label my intents?

Please always put the domain at the end of each intent. For example: ask_transport

What do I do about multi-intent utterences?

If you would like to contribute multi-intent utterences, please add a + to indicate an additional intent, for example: affirm+ask_transport

What about training data that’s not in English?

Currently, we are unable to evaluate the quality of all language contributions, and therefore, during the initial phase we can only accept English training data to the repository. However, we understand that the Rasa community is a global one, and in the long-term we would like to find a solution for this in collaboration with the community.

Why do I need to remove entities from my training data?

We would like to make the training data as easy as possible to adopt to new training models and annotating entities highly dependent on your bot’s purpose. Therefore, we will first focus on collecting training data that only includes intents.

To help you remove the annotated entities from your training data, you can run this script.


About Rasa

Owner
Rasa
Open source machine learning tools for developers to build, improve, and deploy text-and voice-based chatbots and assistants
Rasa
Built for cleaning purposes in military institutions

Ferramenta do AL Construído para fins de limpeza em instituições militares. Instalação Requer python = 3.2 pip install -r requirements.txt Usagem Exe

0 Aug 13, 2022
customer care chatbot made with Rasa Open Source.

Customer Care Bot Customer care bot for ecomm company which can solve faq and chitchat with users, can contact directly to team. 🛠 Features Basic E-c

Dishant Gandhi 23 Oct 27, 2022
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language mod

20.5k Jan 08, 2023
C.J. Hutto 3.8k Dec 30, 2022
An automated program that helps customers of Pizza Palour place their pizza orders

PIzza_Order_Assistant Introduction An automated program that helps customers of Pizza Palour place their pizza orders. The program uses voice commands

Tindi Sommers 1 Dec 26, 2021
Prompt tuning toolkit for GPT-2 and GPT-Neo

mkultra mkultra is a prompt tuning toolkit for GPT-2 and GPT-Neo. Prompt tuning injects a string of 20-100 special tokens into the context in order to

61 Jan 01, 2023
Big Bird: Transformers for Longer Sequences

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the c

Google Research 457 Dec 23, 2022
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

FNet: Mixing Tokens with Fourier Transforms Pytorch implementation of Fnet : Mixing Tokens with Fourier Transforms. Citation: @misc{leethorp2021fnet,

Rishikesh (ऋषिकेश) 217 Dec 05, 2022
Modular and extensible speech recognition library leveraging pytorch-lightning and hydra.

Lightning ASR Modular and extensible speech recognition library leveraging pytorch-lightning and hydra What is Lightning ASR • Installation • Get Star

Soohwan Kim 40 Sep 19, 2022
Ecommerce product title recognition package

revizor This package solves task of splitting product title string into components, like type, brand, model and article (or SKU or product code or you

Bureaucratic Labs 16 Mar 03, 2022
A highly sophisticated sequence-to-sequence model for code generation

CoderX A proof-of-concept AI system by Graham Neubig (June 30, 2021). About CoderX CoderX is a retrieval-based code generation AI system reminiscent o

Graham Neubig 39 Aug 03, 2021
Generating Korean Slogans with phonetic and structural repetition

LexPOS_ko Generating Korean Slogans with phonetic and structural repetition Generating Slogans with Linguistic Features LexPOS is a sequence-to-sequen

Yeoun Yi 3 May 23, 2022
NVDA, the free and open source Screen Reader for Microsoft Windows

NVDA NVDA (NonVisual Desktop Access) is a free, open source screen reader for Microsoft Windows. It is developed by NV Access in collaboration with a

NV Access 1.6k Jan 07, 2023
Header-only C++ HNSW implementation with python bindings

Hnswlib - fast approximate nearest neighbor search Header-only C++ HNSW implementation with python bindings. NEWS: version 0.6 Thanks to (@dyashuni) h

2.3k Jan 05, 2023
A paper list of pre-trained language models (PLMs).

Large-scale pre-trained language models (PLMs) such as BERT and GPT have achieved great success and become a milestone in NLP.

RUCAIBox 124 Jan 02, 2023
A natural language modeling framework based on PyTorch

Overview PyText is a deep-learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapi

Meta Research 6.4k Jan 08, 2023
Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

0 Feb 13, 2022
glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Glow-Speak glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end. Installation git clone https://g

Rhasspy 8 Dec 25, 2022
Shellcode antivirus evasion framework

Schrodinger's Cat Schrodinger'sCat is a Shellcode antivirus evasion framework Technical principle Please visit my blog https://idiotc4t.com/ How to us

idiotc4t 27 Jul 09, 2022
ReCoin - Restoring our environment and businesses in parallel

Shashank Ojha, Sabrina Button, Abdellah Ghassel, Joshua Gonzales "Reduce Reuse R

sabrina button 1 Mar 14, 2022