Quantifiers-and-Negations-in-RE-Documents

This project was part of my work for a seminar at the Technical University of Munich (TUM) during my bachelor studies in 2019. The python project can be used to find quantifiers and negations in documents. It searches for problematic findings. Problematic findings are i.e. sentences that use specific combinations of quantifiers and negations that are ambiguous. This means there are multiple valid interpretations of the sentence. It can extract those and report them.

Motivation:

You want to avoid ambiguous sentences as they can cause problems that are hard to find and possibly hard to fix. This is especially the case for technical specifications and similar use cases. In this project we compare two different approaches to finding ambiguous sentences:

String based search
NLP based search

We want to find out if the computational overhead of using NLP gives better results than standard string based search methods.

Features:

Detect quantifiers and negations in .xml or .txt documents
Search either by a string based search or by NLP based search (using Stanfords CoreNLP library [1])
Extract possibly ambiguous sentences
Compare string search results with NLP search results

Prerequisites:

Java 8 or higher
Python 3.6 or higher as project interpreter
Stanford Corenlp library: https://stanfordnlp.github.io/CoreNLP/download.html
Environment variable "CORENLP_HOME" set to where the CoreNLP library is stored

References:

[1] Christopher D.Manning, MihaiSurdeanu, JohnBauer, JennyFinkel, StevenJ.Bethard, and David McClosky. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pages 55–60, 2014.

Quantifiers and Negations in RE Documents

Related tags

Overview

Quantifiers-and-Negations-in-RE-Documents

Owner

Nicolas Ruscher

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Let Xiao Ai speakers control third-party devices

Simple Python script to scrape youtube channles of "Parity Technologies and Web3 Foundation" and translate them to well-known braille language or any language

The official repository of the ISBI 2022 KNIGHT Challenge

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

👄 The most accurate natural language detection library for Python, suitable for long and short text alike

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

The entmax mapping and its loss, a family of sparse softmax alternatives.

A unified tokenization tool for Images, Chinese and English.

Pre-training BERT masked language models with custom vocabulary

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

FastFormers - highly efficient transformer models for NLU

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

A simple visual front end to the Maya UE4 RBF plugin delivered with MetaHumans

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS)

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"