Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Last update: Dec 28, 2022

Related tags

Overview

Knover

Knover is a toolkit for knowledge grounded dialogue generation based on PaddlePaddle. Knover allows researchers and developers to carry out efficient training/inference of large-scale dialogue generation models.

What's New:

December 2021: We are opening the dialogue generation model of PLATO-XL, with up to 11 billion parameters.
October 2021: We are opening AG-DST, an amendable generation for dialogue state tracking.
February 2021: We are opening our implementation (Team 19) in DSTC9-Track1.
July 2020: We are opening PLATO-2, a large-scale generative model with latent space for open-domain dialogue systems.

Requirements and Installation

python version >= 3.7
paddlepaddle-gpu version >= 2.0.0
- You can install PaddlePaddle following the instructions.
- The specific version of PaddlePaddle is also based on your CUDA version (recommended version: 10.1) and CuDNN version (recommended version: 7.6). See more information on PaddlePaddle document about GPU support
sentencepiece
termcolor
If you want to run distributed training, you'll also need NCCL
Install Knover locally:

git clone https://github.com/PaddlePaddle/Knover.git
cd Knover
pip3 install -e .

Or you can setup PYTHONPATH only:

export PYTHONPATH=/abs/path/to/Knover:$PYTHONPATH

Basic usage

See usage document.

Disclaimer

This project aims to facilitate further research progress in dialogue generation. Baidu is not responsible for the 3rd party's generation with the pre-trained system.

Contact information

For help or issues using Knover, please submit a GitHub issue.

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

Related tags

Overview

Knover

What's New:

Requirements and Installation

Basic usage

Disclaimer

Contact information

Owner

Simple program that translates the name of files into English

Sequence-to-sequence framework with a focus on Neural Machine Translation based on Apache MXNet

Code for the paper PermuteFormer

lightweight, fast and robust columnar dataframe for data analytics with online update

PIZZA - a task-oriented semantic parsing dataset

Installation, test and evaluation of Scribosermo speech-to-text engine

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

Visual Automata is a Python 3 library built as a wrapper for Caleb Evans' Automata library to add more visualization features.

Stuff related to Ben Eater's 8bit breadboard computer

NLP applications using deep learning.

An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

Pretrained Japanese BERT models

This repository collects together basic linguistic processing data for using dataset dumps from the Common Voice project

📔️ Generate a text-based journal from a template file.

Translate - a PyTorch Language Library

LCG T-TEST USING EUCLIDEAN METHOD

Materials (slides, code, assignments) for the NYU class I teach on NLP and ML Systems (Master of Engineering).

A minimal Conformer ASR implementation adapted from ESPnet.

Repo for Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization