Conversational text Analysis using various NLP techniques

Last update: Jan 06, 2023

Related tags

Overview

PyConverse

Let me try first

Installation

pip install pyconverse

Usage

Please try this notebook that demos the core functionalities: basic usage notebook

Introduction

Conversation analytics plays an increasingly important role in shaping great customer experiences across various industries like finance/contact centres etc... primarily to gain a deeper understanding of the customers and to better serve their needs. This library, PyConverse is an attempt to provide tools & methods which can be used to gain an understanding of the conversations from multiple perspectives using various NLP techniques.

Why PyConverse?

I have been doing what can be called conversational text NLP with primarily contact centre data from various domains like Financial services, Banking, Insurance etc for the past year or so, and I have not come across any interesting open-source tools that can help in understanding conversational texts as such I decided to create this library that can provide various tools and methods to analyse calls and help answer important questions/compute important metrics that usually people want to find from conversations, in contact centre data analysis settings.

Where can I use PyConverse?

The primary use case is geared towards contact centre call analytics, but most of the tools that Converse provides can be used elsewhere as well.

There’s a lot of insights hidden in every single call that happens, Converse enables you to extract those insights and compute various kinds of KPIs from the point of Operational Efficiency, Agent Effectiveness & monitoring Customer Experience etc.

If you are looking to answer questions like these:-

What was the overall sentiment of the conversation that was exhibited by the speakers?
Was there periods of dead air(silence periods) between the agents and customer? if so how much?
Was the agent empathetic towards the customer?
What was the average agent response time/average hold time?
What was being said on calls?

and more... pyconverse might be of small help.

What can PyConverse do?

At the moment pyconverse can do a few things that broadly fall into these categories:-

Emotion identification
Empathetic statement identification
Call Segmentation
Topic identification from call segments
Compute various types of Speaker attributes:
1. linguistic attributes like: word counts/number of words per utterance/negations etc.
2. Identify periods of silence & interruptions.
3. Question identification
4. Backchannel identification
Assess the overall nature of the speaker via linguistic attributes and tell if the Speaker is:
1. Talkative, verbally fluent
2. Informal/Personal/social
3. Goal-oriented or Forward/future-looking/focused on past
4. Identify inhibitions

What Next?

Improve documentation.
Add more use case notebooks/examples.
Improve some of the functionalities and make it more streamlined.

Built with:

Transformers	Spacy	Pytorch

Credits:

Note: The backchannel Utterance classification method is inspired by facebook's Unsupervised Topic Segmentation of Meetings with BERT Embeddings paper (arXiv:2106.12978 [cs.LG])

You might also like...

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

2 Jun 10, 2022

An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations

FantasyBert English | 中文 Introduction An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations. You can imp

137 Oct 26, 2022

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Grading tools for Advanced NLP (11-711) Installation You'll need docker and unzip to use this repo. For docker, visit the official guide to get starte

2 Sep 27, 2022

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing 🎉 🎉 🎉 We released the 2.0.0 version with TF2 Support. 🎉 🎉 🎉 If you

2.3k Dec 29, 2022

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing 🎉 🎉 🎉 We released the 2.0.0 version with TF2 Support. 🎉 🎉 🎉 If you

2k Feb 9, 2021

Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

Lime Comparing deep contextualized model for sentences highlighting task. In addition, take the classic explanation model "LIME" with bert-base model

2 Jan 18, 2022

Various capabilities for static malware analysis.

Malchive The malchive serves as a compendium for a variety of capabilities mainly pertaining to malware analysis, such as scripts supporting day to da

64 Nov 22, 2022

Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

Frog for Python This is a Python binding to the Natural Language Processing suite Frog. Frog is intended for Dutch and performs part-of-speech tagging

46 Dec 14, 2022

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

297 Dec 29, 2022

Comments

SemanticTextSegmentation NaN With All Stop Words

When running semantic text segmentation, I found that if the input utterance line is all stop words, (i.e. "Bye. Uh huh. Yeah."), SemanticTextSegmentation._get_similarity fails with ValueError: Input contains NaN.

I found that adding a check for nan in both embeddings could solve this problem.

def _get_similarity(self, text1, text2):
    sentence_1 = [i.text.strip()
                  for i in nlp(text1).sents if len(i.text.split(' ')) > 1]
    sentence_2 = [i.text.strip()
                  for i in nlp(text2).sents if len(i.text.split(' ')) > 2]
    embeding_1 = model.encode(sentence_1)
    embeding_2 = model.encode(sentence_2)
    embeding_1 = np.mean(embeding_1, axis=0).reshape(1, -1)
    embeding_2 = np.mean(embeding_2, axis=0).reshape(1, -1)

    if np.any(np.isnan(embeding_1)) or np.any(np.isnan(embeding_2)):
            return 1

    sim = cosine_similarity(embeding_1, embeding_2)
    return sim

I would like to have someone else look at it because I don't want to make any assumptions that the stop words should be part of the same segments.

opened by Haowjy 1

Updated lru_cache decorator.

After installing and running the library pyconverse on python-3.7 or below and using the import statement it gives error in import itself. I went through the utils file and saw that the "@lru_cache" decorator was written as per the new python(i.e. 3.8+) style hence when calling in older versions(py 3.7 and below it raises a NoneType Error) as the LRU_CACHE decorator is written as -" @lru_cache() " with paranthesis for older versions . Hence made the changes. The changes made do not cause any error on the newer versions.

opened by AkashKhamkar 0
Error in importing Callyzer, SpeakerStats

When I want to load the model it's showing this error.Whether it is currently in devloped mode

KeyError: "[E002] Can't find factory for 'tok2vec'. This usually happens when spaCy callsnlp.create_pipewith a component name that's not built in - for example, when constructing the pipeline from a model's meta.json. If you're using a custom component, you can write to Language.factories['tok2vec'] or remove it from the ### model meta and add it vianlp.add_pipeinstead.

opened by kalpa277 0

Releases(v0.2.0)

v0.2.0(Nov 21, 2021)
First Release of PyConverse library.

Conversational Transcript Analysis using various NLP techniques.

Emotion identification

Empathetic statement identification

Call Segmentation

Topic identification from call segments

Compute various types of Speaker attributes:

linguistic attributes like : word counts/number of words per utterance/negations etc

Identify periods of silence & interruptions.

Question identification

Backchannel identification

Assess the overall nature of the speaker via linguistic attributes and tell if the Speaker is:

Talkative, verbally fluent

Informal/Personal/social

Goal-oriented or Forward/future-looking/focused on past

Identify inhibitions

Source code(tar.gz)
Source code(zip)

Owner

Rita Anjana

ML engineer

GitHub Repository

A Paper List for Speech Translation

Keyword: Speech Translation, Spoken Language Processing, Natural Language Processing

138 Dec 24, 2022

Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch

Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch. Topics: Face detection with Detectron 2, Time Series anomaly detection with LSTM Autoenc

1.8k Dec 31, 2022

Spam filtering made easy for you

spammy Author: Tasdik Rahman Latest version: 1.0.3 Contents 1 Overview 2 Features 3 Example 3.1 Accuracy of the classifier 4 Installation 4.1 Upgradin

137 Dec 18, 2022

Stack based programming language that compiles to x86_64 assembly or can alternatively be interpreted in Python

lang lang is a simple stack based programming language written in Python. It can

1 May 30, 2022

A website which allows you to play with the GPT-2 transformer

transformers A website which allows you to play with the GPT-2 model Built with ❤️ by raphtlw Table of contents Model Setup About Contributors Model T

2 Jan 27, 2022

Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!

4 Jul 20, 2022

Code for lyric-section-to-comment generation based on huggingface transformers.

CommentGeneration Code for lyric-section-to-comment generation based on huggingface transformers. Migrate Guyu model and code (both 12-layers and 24-l

8 Sep 04, 2021

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutt

475 Jan 04, 2023

IMDB film review sentiment classification based on BERT's supervised learning model.

IMDB film review sentiment classification based on BERT's supervised learning model. On the other hand, the model can be extended to other natural language multi-classification tasks.

1 Apr 17, 2022

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network) BERTAC is a framework that combines a

6 Jan 24, 2022

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

AAGCN-ACSA EMNLP 2021 Introduction This repository was used in our paper: Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment An

36 Dec 18, 2022

Conversational text Analysis using various NLP techniques

Related tags

Overview

PyConverse

Let me try first

Installation

Usage

Introduction

Why PyConverse?

Where can I use PyConverse?

What can PyConverse do?

What Next?

Built with:

Credits:

You might also like...

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations

Grading tools for Advanced NLP (11-711)Grading tools for Advanced NLP (11-711)

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

Various capabilities for static malware analysis.

Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

Comments

SemanticTextSegmentation NaN With All Stop Words

Updated lru_cache decorator.

Error in importing Callyzer, SpeakerStats

Releases(v0.2.0)

v0.2.0(Nov 21, 2021)

Owner

Rita Anjana

A Paper List for Speech Translation

Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch

Spam filtering made easy for you

Stack based programming language that compiles to x86_64 assembly or can alternatively be interpreted in Python

A website which allows you to play with the GPT-2 transformer

Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!

Code for lyric-section-to-comment generation based on huggingface transformers.

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

IMDB film review sentiment classification based on BERT's supervised learning model.

BERTAC (BERT-style transformer-based language model with Adversarially pretrained Convolutional neural network)

Beta Distribution Guided Aspect-aware Graph for Aspect Category Sentiment Analysis with Affective Knowledge. Proceedings of EMNLP 2021

XLNet: Generalized Autoregressive Pretraining for Language Understanding

构建一个多源（公众号、RSS）、干净、个性化的阅读环境

NLP topic mdel LDA - Gathered from New York Times website

Chatbot with Pytorch, Python & Nextjs

justCTF [*] 2020 challenges sources

Index different CKAN entities in Solr, not just datasets

MiCECo - Misskey Custom Emoji Counter

Python bot created with Selenium that can guess the daily Wordle word correct 96.8% of the time.

Finding Label and Model Errors in Perception Data With Learned Observation Assertions