Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
A Superfast SMS & Call bomber for Linux And Termux !

A Superfast SMS & Call bomber for Linux And Termux !

Anubhav Kashyap 15 Feb 21, 2022
Lite - Lite cracker tool for python

Wellcome to tools Results Install Tools

Jeeck X Nano 23 Dec 17, 2022
Shell hunter for AF

AF-ShellHunter AF-ShellHunter: Auto shell lookup AF-ShellHunter its a script designed to automate the search of WebShell's in AF Team How to pip3 ins

Eduardo 34 May 13, 2022
Windows Stack Based Auto Buffer Overflow Exploiter

Autoflow - Windows Stack Based Auto Buffer Overflow Exploiter Autoflow is a tool that exploits windows stack based buffer overflow automatically.

Himanshu Shukla 19 Dec 22, 2022
Monty Hall Problem simulation written in Python.

Monty Hall Problem Simulation monty_hall_sim is a brute-force method of determining the optimal strategy for the Monty Hall Problem. Usage Set boolean

Xavier D 1 Aug 29, 2022
Agile Threat Modeling Toolkit

Threagile is an open-source toolkit for agile threat modeling:

Threagile 425 Jan 07, 2023
Simulating Log4j Remote Code Execution (RCE) vulnerability in a flask web server using python's logging library with custom formatter that simulates lookup substitution by executing remote exploit code.

py4jshell Simulating Log4j Remote Code Execution (RCE) CVE-2021-44228 vulnerability in a flask web server using python's logging library with custom f

Narasimha Prasanna HN 86 Aug 21, 2022
Scanning for CVE-2021-44228

Filesystem log4j_scanner for windows and Unix. Scanning for CVE-2021-44228, CVE-2021-45046, CVE-2019-17571 Requires a minimum of Python 2.7. Can be ex

Brett England 4 Jan 09, 2022
Script checks provided domains for log4j vulnerability

log4j Script checks provided domains for log4j vulnerability. A token is created with canarytokens.org and passed as header at request for a single do

Matthias Nehls 2 Dec 12, 2021
Unsafe Twig processing of static pages leading to RCE in Grav CMS 1.7.10

CVE-2021-29440 Unsafe Twig processing of static pages leading to RCE in Grav CMS 1.7.10 Grav is a file based Web-platform. Twig processing of static p

Enox 6 Oct 10, 2022
Log4j command generator: Generate commands for CVE-2021-44228

Log4j command generator Generate commands for CVE-2021-44228. Description The vulnerability exists due to the Log4j processor's handling of log messag

1 Jan 03, 2022
Course: Information Security with Python

Curso: Segurança da Informação com Python Curso realizado atravès da Plataforma da Digital Innovation One Prof: Bruno Dias Conteúdo: Introdução aos co

Elizeu Barbosa Abreu 1 Nov 28, 2021
Archive-Crack - A Tools for crack file archive

Install In TERMUX apt update && apt upgrade -y pkg install python git unrar

X - MrG3P5 10 Oct 06, 2022
Log4j-Scanner with Bind-Receipt and custom hostnames

Hrafna - Log4j-Scanner for the masses Features Scanning-system designed to check your own infra for vulnerable log4j-installations start and stop scan

18 Jan 23, 2022
Scans all drives for log4j jar files and gets their version from the manifest

log4shell_scanner Scans all drives for log4j jar files and gets their version from the manifest. Windows and Windows Server only.

Zdeněk Loučka 1 Dec 29, 2021
CVE-2022-22965 - CVE-2010-1622 redux

CVE-2022-22965 - vulnerable app and PoC Trial & error $ docker rm -f rce; docker build -t rce:latest . && docker run -d -p 8080:8080 --name rce rce:la

Duarte Duarte 20 Aug 25, 2022
CamRaptor is a tool that exploits several vulnerabilities in popular DVR cameras to obtain device credentials.

CamRaptor is a tool that exploits several vulnerabilities in popular DVR cameras to obtain device credentials.

EntySec 118 Dec 24, 2022
Tools to make working the Arch Linux Security Tracker easier

This is a collection of Python scripts to make working with the Arch Linux Security Tracker easier.

Jonas Witschel 6 Jul 13, 2022
Details,PoC and patches for CVE-2021-45383 & CVE-2021-45384

CVE-2021-45383 & CVE-2021-45384 There are several network-layer vulnerabilities in the official server of Minecraft: Bedrock Edition (aka Bedrock Serv

20 Apr 07, 2022
MainCoon - an automated recon framework

MainCoon is an automated recon framework meant for gathering information during penetration testing of web applications.

Md. Nur habib 8 Aug 26, 2022