This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

Last update: Dec 09, 2021

Overview

POS-Tagger

This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.

What is Part-of-Speech Tagging?

In corpus linguistics, part-of-speech tagging (POS tagging, PoS tagging, or POST), also known as "grammatical tagging," is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition and its context. Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as "hidden" parts of speech, by a set of descriptive tags. POS-tagging algorithms fall into two distinctive categories: rule-based and stochastic. Because applying a rule-based model to predict tags in a sequence is cumbersome and restricted to a computational linguist's understanding of allowable sentence construction in the context of language productivity, I'll instead be taking a stochastic approach to assigning POS tags to words in a sequence through the use of Trigram Hidden Markov Models.

What are Trigram Hidden Markov Models (HMMs)?

The hidden Markov model, or HMM for short, is a probabilistic sequence model that assigns a label to each unit in a sequence of observations (i.e, input sentences). The model computes a probability distribution over possible sequences of POS labels (using a training corpus) and then chooses the best label sequence that maximizes the probability of generating the observed sequence. The HMM is widely used in natural language processing since language consists of sequences at many levels such as sentences, phrases, words, or even characters. The HMM can be enhanced to incorporate not only unobservable parts-of-speech, but also observable components (i.e., the actual order of words in a sequence) through the use of a probability distribution over the set of trigrams in the given corpus. This allows our model to distinguish between homophones, or words that share the same spelling or pronunciation, but differ in meaning and parts-of-speech (i.e., "rose" as in "rose bush" (NN) and "rose" (VBD) as in the past tense of "rise").

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

Related tags

Overview

POS-Tagger

What is Part-of-Speech Tagging?

What are Trigram Hidden Markov Models (HMMs)?

Owner

Raihan Ahmed

Kinky furry assitant based on GPT2

A flask application to predict the speech emotion of any .wav file.

Just Another Telegram Ai Chat Bot Written In Python With Pyrogram.

scikit-learn wrappers for Python fastText.

The implementation of Parameter Differentiation based Multilingual Neural Machine Translation

pyupbit 라이브러리를 활용하여 upbit에서 비트코인을 자동매매하는 코드입니다. 조코딩 유튜브 채널에서 자세한 강의 영상을 보실 수 있습니다.

Diaformer: Automatic Diagnosis via Symptoms Sequence Generation

Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

🤗🖼️ HuggingPics: Fine-tune Vision Transformers for anything using images found on the web.

A fast and easy implementation of Transformer with PyTorch.

Yes it's true :broken_heart:

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".

Modified GPT using average pooling to reduce the softmax attention memory constraints.

IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

CoSENT、STS、SentenceBERT

NVDA, the free and open source Screen Reader for Microsoft Windows

Transformers implementation for Fall 2021 Clinic

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

Open-World Entity Segmentation

A simple visual front end to the Maya UE4 RBF plugin delivered with MetaHumans