The aim of this task is to predict someone's English proficiency based on a text input.

Last update: Dec 13, 2021

Overview

English_proficiency_prediction_NLP

The aim of this task is to predict someone's English proficiency based on a text input.

Using the The NICT JLE Corpus available here : https://alaginrc.nict.go.jp/nict_jle/index_E.html

The source of the corpus data is the transcripts of the audio-recorded speech samples of 1,281 participants (1.2 million words, 300 hours in total) of English oral proficiency interview test. Each participant got a SST (Standard Speaking Test) score between 1 (low proficiency) and 9 (high proficiency) based on this test.

The goal is to build a machine learning algorithm for predicting the SST score of each participant based on their transcript.

Steps:

1 - Pre-process the dataset: extract the participant transcript (all tags). Inside participant transcript, you can remove all other tags and extract only English words.

2 - Process the dataset: extract features with the Bag of Word (BoW) technique

3 - Train a classifier to predict the SST score

4 - Compute the accuracy of your system (the number of participant classified correctly) and plot the confusion matrix.

5 - Try to improve your system (for example you can try to use GloVe instead of BoW).

The aim of this task is to predict someone's English proficiency based on a text input.

Related tags

Overview

English_proficiency_prediction_NLP

Owner

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

An evaluation toolkit for voice conversion models.

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

Image2pcl - Enter the metaverse with 2D image to 3D projections

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

一个基于Nonebot2和go-cqhttp的娱乐性qq机器人

A look-ahead multi-entity Transformer for modeling coordinated agents.

Module for automatic summarization of text documents and HTML pages.

Open source code for AlphaFold.

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

Simple Annotated implementation of GPT-NeoX in PyTorch

InferSent sentence embeddings

MHtyper is an end-to-end pipeline for recognized the Forensic microhaplotypes in Nanopore sequencing data.

The ibet-Prime security token management system for ibet network.

A Transformer Implementation that is easy to understand and customizable.

A fast, efficient universal vector embedding utility package.

Super easy library for BERT based NLP models

I can help you convert your images to pdf file.

The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

Official PyTorch implementation of SegFormer