Speech Rankings

This project mimics CSRankings to generate an ordered list of researchers in speech/spoken language processing along with their possible research topics, based on recent publications on important venues of the field, so as to help students seeking for PhD studies to find desirable advisors.

How to use

The pre-generated report is available at here. To build it by yourself,

Run prepare_data.py to build publications.json and authors.json, or simply use the data provided, covering those from 2011 to 2021.
Run export.py to generate the report.

How does it work

We scrape author metadata and publication data of the following three types of venues from DBLP, including:

Speech venues: Interspeech, Speech Communications, SLT, SSW, ASRU, IWSLT
Mixed venues: ICASSP, TASLP
General venues: NeurIPS, ICML, ICLR, ACL, EMNLP, NAACL, KDD, AAAI, IJCAI

All publications in Speech venues are included. Paricularly for Interspeech, section/field of each paper are collected from ISCA Archive to show possible research topics of each researcher. So are the keywords from IEEE Xplore for papers published on IEEE-held venues. Keywords (as well as titles) are also used to filter out non-speech papers in Mixed venues by a set of rules. Titles are used to identify speech papers in General venues. Researchers are sorted by the total number of publications.

The collected data contain errors, and the project is neither intended to index speech-related papers nor to compare researchers in the field.

A CSRankings-like index for speech researchers

Related tags

Overview

Speech Rankings

How to use

How does it work

Owner

Mutian He

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

VMD Audio/Text control with natural language

Weakly-supervised Text Classification Based on Keyword Graph

A CRM department in a local bank works on classify their lost customers with their past datas. So they want predict with these method that average loss balance and passive duration for future.

My implementation of Safaricom Machine Learning Codility test. The code has bugs, logical I guess I made errors and any correction will be appreciated.

MHtyper is an end-to-end pipeline for recognized the Forensic microhaplotypes in Nanopore sequencing data.

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

Pipeline for training LSA models using Scikit-Learn.

Nested Named Entity Recognition

Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Research code for "What to Pre-Train on? Efficient Intermediate Task Selection", EMNLP 2021

Translation to python of Chris Sims' optimization function

Plugin repository for Macast

skweak: A software toolkit for weak supervision applied to NLP tasks

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Trained T5 and T5-large model for creating keywords from text

Higher quality textures for the Metal Gear Solid series.

Autoregressive Entity Retrieval

NLP and Text Generation Experiments in TensorFlow 2.x / 1.x