Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
CVE-2021-21972

CVE-2021-21972 % python3 /tmp/CVE_2021_21972.py -i /tmp/urls.txt -n 8 -e [*] Creating tmp.tar containing ../../../../../home/vsphere-ui/.ssh/authoriz

Keith Lee 30 Nov 19, 2022
IDA plugin for quickly copying disassembly as encoded hex bytes

HexCopy IDA plugin for quickly copying disassembly as encoded hex bytes. This whole plugin just saves you two extra clicks... but if you are frequentl

OALabs 46 Oct 30, 2022
Domain abuse scanner covering domainsquatting and phishing keywords.

🦷 monodon 🐋 Domain abuse scanner covering domainsquatting and phishing keywords. Setup Monodon is a Python 3.7+ programm. To setup on a Linux machin

2 Mar 15, 2022
This little tool is to calculate a MurmurHash value of a favicon to hunt phishing websites on the Shodan platform.

MurMurHash This little tool is to calculate a MurmurHash value of a favicon to hunt phishing websites on the Shodan platform. What is MurMurHash? Murm

Viral Maniar 87 Dec 31, 2022
This is a Python program that implements a vacuum cleaner as an Artificial Intelligence.

Vacuum-Cleaner Python3 This is a Python3 agent that implements a simulator for a vacuum cleaner and it is introduction to Artificial Intelligence. A s

Abdultawwab Safarji 6 Nov 14, 2022
Fuzz introspector is a tool to help fuzzer developers to get an understanding of their fuzzer’s performance and identify any potential blockers.

Fuzz introspector Fuzz introspector is a tool to help fuzzer developers to get an understanding of their fuzzer’s performance and identify any potenti

Open Source Security Foundation (OpenSSF) 221 Jan 01, 2023
nuclei scanner for proxyshell ( CVE-2021-34473 )

Proxyshell-Scanner nuclei scanner for Proxyshell RCE (CVE-2021-34423,CVE-2021-34473,CVE-2021-31207) discovered by orange tsai in Pwn2Own, which affect

PikaChu 29 Dec 16, 2022
Ensure secure infrastructure and consistency with the firewall rules

Python Port Scanner This script tries to check if it's possible to make a connection with the specific endpoint port. This is very useful to ensure se

Allan Avelar 7 Feb 26, 2022
Repository for a project of the course EP2520 Building Networked Systems Security

EP2520_ACME_Project Repository for a project of the course EP2520 Building Networked Systems Security in Royal Institute of Technology (KTH), Stockhol

1 Dec 11, 2021
#whois it? Let's find out!

whois_bot #whois it? Let's find out! Currently in development: a gatekeeper bot for a community (https://t.me/IT_antalya) of 250+ expat IT pros of Ant

Kirill Nikolaev 14 Jun 24, 2022
ProxyShell POC Exploit : Exchange Server RCE (ACL Bypass + EoP + Arbitrary File Write)

ProxyShell Install git clone https://github.com/ktecv2000/ProxyShell cd ProxyShell virtualenv -p $(which python3) venv source venv/bin/activate pip3 i

Poming huang 312 Dec 09, 2022
Crowbar - A windows post exploitation tool

Crowbar - A windows post exploitation tool Status - ✔️ This project is now considered finished. Any updates from now on will most likely be new script

29 Nov 20, 2022
IDA loader for Apple's iBoot, SecureROM and AVPBooter

IDA iBoot Loader IDA loader for Apple's iBoot, SecureROM and AVPBooter Installation Copy iboot-loader.py to the loaders folder in IDA directory. Credi

matteyeux 74 Dec 23, 2022
An easy-to-use wrapper for NTFS-3G on macOS

ezNTFS ezNTFS is an easy-to-use wrapper for NTFS-3G on macOS. ezNTFS can be used as a menu bar app, or via the CLI in the terminal. Installation To us

Matthew Go 34 Dec 01, 2022
Fast python tool to test apache path traversal CVE-2021-41773 in a List of url

CVE-2021-41773 Fast python tool to test apache path traversal CVE-2021-41773 in a List of url Usage :- create a live urls file and use the flag "-l" p

Zahir Tariq 12 Nov 09, 2022
Password Manager is a simple Python project which helps users in managing their passwords in a easier way

Password Manager is a simple Python project which helps users in managing their passwords in a easier way

Manish Jalui 4 Sep 29, 2021
Python implementation of the diceware password generating algorithm.

Diceware Password Generator - Generate High Entropy Passwords Please Note - This Program Do Not Store Passwords In Any Form And All The Passwords Are

Sameera Madushan 35 Dec 25, 2022
Confluence OGNL injection

CVE-2021-26084 Confluence OGNL injection CVE-2021-26084 is an Object-Graph Navigation Language (OGNL) injection vulnerability in the Atlassian Conflue

Ashish Kunwar 15 Sep 23, 2022
CVE-2022-22965 - CVE-2010-1622 redux

CVE-2022-22965 - vulnerable app and PoC Trial & error $ docker rm -f rce; docker build -t rce:latest . && docker run -d -p 8080:8080 --name rce rce:la

Duarte Duarte 20 Aug 25, 2022
This is tools hacking for scan vuln in port web, happy using

Xnuvers007 PortInjection this is tools hacking for scan vuln in port web, happy using view/show python 3.9 solo coder (tangerang) 19 y/o installation

XnuxersXploitXen 6 Dec 24, 2022