Python-zhuyin - An open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo)

Overview

Python-Zhuyin (pyzhuyin) 注音和拼音轉換

Introduction 介紹

pyzhuyin is an open source Python library that provides a unified interface for converting between Chinese pinyin and Zhuyin (bopomofo).

pyzhuyin 是一個開放原始碼的 Python 套件,提供了將拼音轉換成注音的統一介面。

Installation 安裝

pip install pyzhuyin

Usage 使用

from pyzhuyin import pinyin_to_zhuyin, zhuyin_to_pinyin


assert(pinyin_to_zhuyin("lu3") == "ㄌㄨˇ")
assert(pinyin_to_zhuyin("dan4") == "ㄉㄢˋ")
assert(map(pinyin_to_zhuyin, ["lu3", "dan4"]) == ["ㄌㄨˇ", "ㄉㄢˋ"])

assert(zhuyin_to_pinyin("ㄌㄩˊ") == "lü2")
assert(zhuyin_to_pinyin("˙ㄗ") == "zi5")
assert(map(lambda z: zhuyin_to_pinyin(z, u_to_v=True), ["ㄌㄩˊ", "˙ㄗ"]) == ["lv2", "zi5"])

Testing 測試

Run the following command at the root of the project to test the library:

在根目錄執行以下指令以測試套件:

python3 -m unittest

Notes 備註

  • Only support numeric tone for pinyin
    • e.g. "lu3" instead of "lǔ"
  • Neutral tone is represented as 5
    • e.g. "˙ㄗ" -> "zi5"
  • For pinyin_to_zhuyin:
    • if corresponding zhuyin not found, raise ValueError
    • internally convert all v to ü
  • For zhuyin_to_pinyin:
    • if corresponding pinyin not found, raise ValueError
  • 兒化音 is not supported because it is not representable in the zhuyin system as a "combo" word
    • e.g. "公園兒" -> "gong1 yuanr2" -> "ㄍㄨㄥ ㄩㄢㄦˊ" (not allowed)

Data Sources 資料來源

Author 作者

  • Raymond Ku
Wrapper to display a script output or a text file content on the desktop in sway or other wlroots-based compositors

nwg-wrapper This program is a part of the nwg-shell project. This program is a GTK3-based wrapper to display a script output, or a text file content o

Piotr Miller 94 Dec 27, 2022
BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents

BROS (BERT Relying On Spatiality) is a pre-trained language model focusing on text and layout for better key information extraction from documents. Given the OCR results of the document image, which

Clova AI Research 94 Dec 30, 2022
MASS: Masked Sequence to Sequence Pre-training for Language Generation

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Microsoft 1.1k Dec 17, 2022
PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

VAENAR-TTS - PyTorch Implementation PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Keon Lee 67 Nov 14, 2022
SimCTG - A Contrastive Framework for Neural Text Generation

A Contrastive Framework for Neural Text Generation Authors: Yixuan Su, Tian Lan,

Yixuan Su 345 Jan 03, 2023
Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

gpt2-poetry The following code is for my senior honor's thesis project, under the guidance of Dr. Keith Holyoak at the University of California, Los A

Ashley Kim 2 Jan 09, 2022
The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Kay Savetz 60 Dec 25, 2022
Blackstone is a spaCy model and library for processing long-form, unstructured legal text

Blackstone Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project f

ICLR&D 579 Jan 08, 2023
This code is the implementation of Text Emotion Recognition (TER) with linguistic features

APSIPA-TER This code is the implementation of Text Emotion Recognition (TER) with linguistic features. The network model is BERT with a pretrained mod

kenro515 1 Feb 08, 2022
Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Neural G2P to portuguese language Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written for

fluz 11 Nov 16, 2022
Large-scale pretraining for dialogue

A State-of-the-Art Large-scale Pretrained Response Generation Model (DialoGPT) This repository contains the source code and trained model for a large-

Microsoft 1.8k Jan 07, 2023
Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.

Sonnet finder Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet. Usage This is a Python scrip

Marcel Bollmann 11 Sep 25, 2022
This codebase facilitates fast experimentation of differentially private training of Hugging Face transformers.

private-transformers This codebase facilitates fast experimentation of differentially private training of Hugging Face transformers. What is this? Why

Xuechen Li 73 Dec 28, 2022
Suite of 500 procedurally-generated NLP tasks to study language model adaptability

TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas

Belinda Li 20 May 17, 2022
A method for cleaning and classifying text using transformers.

NLP Translation and Classification The repository contains a method for classifying and cleaning text using NLP transformers. Overview The input data

Ray Chamidullin 0 Nov 15, 2022
Universal End2End Training Platform, including pre-training, classification tasks, machine translation, and etc.

背景 安装教程 快速上手 (一)预训练模型 (二)机器翻译 (三)文本分类 TenTrans 进阶 1. 多语言机器翻译 2. 跨语言预训练 背景 TrenTrans是一个统一的端到端的多语言多任务预训练平台,支持多种预训练方式,以及序列生成和自然语言理解任务。 安装教程 git clone git

Tencent Minority-Mandarin Translation Team 42 Dec 20, 2022
Final Project Bootcamp Zero

The Quest (Pygame) Descripción Este es el repositorio de código The-Quest para el proyecto final Bootcamp Zero de KeepCoding. El juego consiste en la

Seven-z01 1 Mar 02, 2022
Python code for ICLR 2022 spotlight paper EViT: Expediting Vision Transformers via Token Reorganizations

Expediting Vision Transformers via Token Reorganizations This repository contain

Youwei Liang 101 Dec 26, 2022
Finally decent dictionaries based on Wiktionary for your beloved eBook reader.

eBook Reader Dictionaries Finally, decent dictionaries based on Wiktionary for your beloved eBook reader. Dictionaries Catalan 🚧 Ελληνικά (help welco

Mickaël Schoentgen 163 Dec 31, 2022
The training code for the 4th place model at MDX 2021 leaderboard A.

The training code for the 4th place model at MDX 2021 leaderboard A.

Chin-Yun Yu 32 Dec 18, 2022