Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
LeLeLe: A tool to simplify the application of Lattice attacks.

LeLeLe is a very simple library (300 lines) to help you more easily implement lattice attacks, the library is inspired by Z3Py (python interfa

Mathias Hall-Andersen 4 Dec 14, 2021
RedlineSpam - Python tool to spam Redline Infostealer panels with legit looking data

RedlineSpam Python tool to spam Redline Infostealer panels with legit looking da

4 Jan 27, 2022
Cam-Hacker: Ip Cameras hack with python

Cam-Hacker Hack Cameras Mode Of Execution: apt-get install python3 apt-get insta

Error 4 You 9 Dec 17, 2022
A brute force tool for password-protected zip file

Bzip A brute force tool for password-protected zip file/folder(s). Note that this tool can only crack .zip files. Please DO not misuse. Installation g

3 Nov 13, 2021
S2-062 (CVE-2021-31805) / S2-061 / S2-059 RCE

CVE-2021-31805 Remote code execution S2-062 (CVE-2021-31805) Due to Apache Struts2's incomplete fix for S2-061 (CVE-2020-17530), some tag attributes c

warin9 31 Nov 22, 2022
'Our Drowsinessdetector detects drivers eyes if they are closed for more than 2 seconds and alerts driver'

Data analysis Document here the project: DriverDrowsinessDetector Description: Project Description Data Source: Type of analysis: Please document the

3 Jul 03, 2022
Apache Solr SSRF(CVE-2021-27905)

Solr-SSRF Apache Solr SSRF #Use [-] Apache Solr SSRF漏洞 (CVE-2021-27905) [-] Options: -h or --help : 方法说明 -u or --url

Henry4E36 70 Nov 09, 2022
A BurpSuite extension to parse 5GC NF OpenAPI 3.0 files to assess 5G core networks

5GC_API_parse Description 5GC API parse is a BurpSuite extension allowing to assess 5G core network functions, by parsing the OpenAPI 3.0 not supporte

PentHertz 57 Dec 16, 2022
Discord-keylogger - Discord keylogger With Python

Discord-keylogger Usage python dlogger.py -t [Time interval in sec] if not speci

Satwik Sinha 1 Jan 30, 2022
MTBLLS Ethical Hacking Tool Announcement of v2.0

MTBLLS Ethical Hacking Tool Announcement of v2.0 MTBLLS is a Free and Open-Source Ethical Hacking Tool developed by GhostTD (SkyWtkh) The tool can onl

Ghost 2 Mar 19, 2022
CloakifyFactory & the Cloakify Toolset - Data Exfiltration & Infiltration In Plain Sight;

CloakifyFactory CloakifyFactory & the Cloakify Toolset - Data Exfiltration & Infiltration In Plain Sight; Evade DLP/MLS Devices; Social Engineering of

3 Oct 18, 2022
PrivateRoom - Make your work private by building a system using arduino which instantly kills a program when someone enters your room/cabin

privateRoom Make your work private by building a system using arduino which instantly kills a program when someone enters your room/cabin STEPS: Uploa

Divyanshu Kumar 3 Nov 08, 2022
This is a Cryptographied Password Manager, a tool for storing Passwords in a Secure way

Cryptographied Password Manager This is a Cryptographied Password Manager, a tool for storing Passwords in a Secure way without using external Service

Francesco 3 Nov 23, 2022
Scout Suite - an open source multi-cloud security-auditing tool,

Description Scout Suite is an open source multi-cloud security-auditing tool, which enables security posture assessment of cloud environments. Using t

NCC Group Plc 5k Jan 05, 2023
RDP Stealer

RDP Stealer RDP Stealer by lamp Require Python How To Use Download This Source Extract The Zip File Change webhook url Convert to exe send to target I

Lamp 14 Nov 26, 2022
PoC for CVE-2021-45897 aka SCRMBT-#180 - RCE via Email-Templates (Authenticated only) in SuiteCRM <= 8.0.1

CVE-2021-45897 PoC for CVE-2021-45897 aka SCRMBT-#180 - RCE via Email-Templates (Authenticated only) in SuiteCRM = 8.0.1 This vulnerability was repor

Manuel Zametter 17 Nov 09, 2022
Evil-stalker - A simple tool written in python, it is so simple that it is based on google dorks

evil-stalker How to run First of all, you must install the necessary libraries.

rock3d 6 Nov 16, 2022
A proof-of-concept exploit for Log4j RCE Unauthenticated (CVE-2021-44228)

CVE-2021-44228 – Log4j RCE Unauthenticated About This is a proof-of-concept exploit for Log4j RCE Unauthenticated (CVE-2021-44228). This vulnerability

Pedro Havay 20 Nov 11, 2022
GitGuardian Shield: protect your secrets with GitGuardian

Detect secret in source code, scan your repo for leaks. Find secrets with GitGuardian and prevent leaked credentials. GitGuardian is an automated secrets detection & remediation service.

GitGuardian 1.2k Dec 27, 2022
Repository for a project of the course EP2520 Building Networked Systems Security

EP2520_ACME_Project Repository for a project of the course EP2520 Building Networked Systems Security in Royal Institute of Technology (KTH), Stockhol

1 Dec 11, 2021