Speech Rankings

This project mimics CSRankings to generate an ordered list of researchers in speech/spoken language processing along with their possible research topics, based on recent publications on important venues of the field, so as to help students seeking for PhD studies to find desirable advisors.

How to use

The pre-generated report is available at here. To build it by yourself,

Run prepare_data.py to build publications.json and authors.json, or simply use the data provided, covering those from 2011 to 2021.
Run export.py to generate the report.

How does it work

We scrape author metadata and publication data of the following three types of venues from DBLP, including:

Speech venues: Interspeech, Speech Communications, SLT, SSW, ASRU, IWSLT
Mixed venues: ICASSP, TASLP
General venues: NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL, KDD, AAAI, IJCAI

All publications in Speech venues are included. Paricularly for Interspeech, section/field of each paper are collected from ISCA Archive to show possible research topics of each researcher. So are the keywords from IEEE Xplore for papers published on IEEE-held venues. Keywords (as well as titles) are also used to filter out non-speech papers in Mixed venues by a set of rules. Titles are used to identify speech papers in General venues. Researchers are sorted by the total number of publications.

The collected data contain errors, and the project is neither intended to index speech-related papers nor to compare researchers in the field.

A CSRankings-like index for speech researchers

Related tags

Overview

Speech Rankings

How to use

How does it work

Owner

Mutian He

TensorFlow code and pre-trained models for BERT

Super easy library for BERT based NLP models

The official implementation of "BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies?, ACL 2021 main conference"

Client library to download and publish models and other files on the huggingface.co hub

Final Project Bootcamp Zero

BPEmb is a collection of pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) and trained on Wikipedia.

Textpipe: clean and extract metadata from text

🤕 spelling exceptions builder for lazy people

Making text a first-class citizen in TensorFlow.

Unlimited Call - Text Bombing Tool

Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

PyTorch implementation of Tacotron speech synthesis model.

A collection of models for image - text generation in ACM MM 2021.

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Use Tensorflow2.7.0 Build OpenAI'GPT-2

ZUNIT - Toward Zero-Shot Unsupervised Image-to-Image Translation

CCQA A New Web-Scale Question Answering Dataset for Model Pre-Training

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank