Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
Fast and customizable vulnerability scanner For JIRA written in Python

Fast and customizable vulnerability scanner For JIRA. 🤔 What is this? Jira-Lens 🔍 is a Python Based vulnerability Scanner for JIRA. Jira is a propri

Mayank Pandey 185 Dec 25, 2022
Workshop Material on VM-based Deobfuscation

Analysis of Virtualization-based Obfuscation This repository contains slides, samples and code of the 4h code deobfuscation workshop at r2con2021. We

Tim Blazytko 133 Dec 18, 2022
A python tool capable of creating HUGE wordlists. Has the ability to add custom words for concatenation in any way you see fit.

A python tool capable of creating HUGE wordlists. Has the ability to add custom words for concatenation in any way you see fit.

Codex 9 Oct 05, 2022
A repository to detect the ARP spoofing in any devices and prevent Man in the Middle(MITM) attack using Python3

arp_spoof_detector A repository to detect the ARP spoofing in any devices and prevent Man in the Middle(MITM) attack using Python3 Usage: git clone ht

Surya Das N 1 Oct 30, 2021
You can manage your password with this program.

You must have Python compilers in order to run this program. First of all, download the compiler in the link.

Mustafa Bahadır Doğrusöz 6 Aug 07, 2021
web指纹识别工具

前言 一直苦于没有用的顺手的web指纹识别工具,学习前辈s7ckTeam的Glass和broken5的WebAliveScan优秀开源程序开发的轻量型web指纹工具。

EASY 966 Dec 26, 2022
This is an injection tool that can inject any xposed modules apk into the debug android app

This is an injection tool that can inject any xposed modules apk into the debug android app, the native code in the xposed module can also be injected.

Windy 32 Nov 05, 2022
Python-based proof-of-concept tool for generating payloads that utilize unsafe Java object deserialization.

Python-based proof-of-concept tool for generating payloads that utilize unsafe Java object deserialization.

Astro 9 Sep 27, 2022
Exploiting CVE-2021-42278 and CVE-2021-42287 to impersonate DA from standard domain user

Exploiting CVE-2021-42278 and CVE-2021-42287 to impersonate DA from standard domain user Known issues it will not work outside kali , i will update it

Hossam 867 Dec 22, 2022
Open Source Tool - Cybersecurity Graph Database in Neo4j

GraphKer Open Source Tool - Cybersecurity Graph Database in Neo4j |G|r|a|p|h|K|e|r| { open source tool for a cybersecurity graph database in neo4j } W

Adamantios - Marios Berzovitis 27 Dec 06, 2022
PortSwigger Burp Plugin for the Log4j (CVE-2021-44228)

yLog4j This is Y-Sec's @PortSwigger Burp Plugin for the Log4j CVE-2021-44228 vulnerability. The focus of yLog4j is to support mass-scanning of the Log

Y-Security 1 Jan 31, 2022
PyPasser is a Python library for bypassing reCaptchaV3 only by sending 2 requests.

PyPasser is a Python library for bypassing reCaptchaV3 only by sending 2 requests. In 1st request, gets token of captcha and in 2nd request,

253 Jan 05, 2023
RedTeam-Security - In this repo you will get the information of Red Team Security related links

OSINT Passive Discovery Amass - https://github.com/OWASP/Amass (Attack Surface M

Abhinav Pathak 5 May 18, 2022
Code to do NF in HDR,HEVC,HPL,MPL

Netflix-DL 6.0 |HDR-HEVC-MPL-HPL NOT Working| ! Buy working netflix cdm from [em

4 Dec 28, 2021
解密哥斯拉webshell管理工具流量

kingkong 解密哥斯拉Godzilla-V2.96 webshell管理工具流量 目前只支持jsp类型的webshell流量解密 Usage 获取攻击者上传到服务器的webshell样本 获取wireshark之类的流量包,一般甲方有科来之类的全流量镜像设备,联系运维人员获取,这里以test.

h4ck for fun 46 Dec 21, 2022
HTTP Protocol Stack Remote Code Execution Vulnerability CVE-2022-21907

CVE-2022-21907 Description POC for CVE-2022-21907: HTTP Protocol Stack Remote Code Execution Vulnerability. create by antx at 2022-01-17. Detail HTTP

赛欧思网络安全研究实验室 365 Nov 30, 2022
A Docker based LDAP RCE exploit demo for CVE-2021-44228 Log4Shell

log4j-poc An LDAP RCE exploit for CVE-2021-44228 Log4Shell Description This demo Tomcat 8 server has a vulnerable app deployed on it and is also vulne

60 Dec 10, 2022
BETA: Layla - recon tool for bug bounty

WELCOME TO LAYLA Layla is a python script that automatically performs recon on a

Matheus Faria 68 Jan 04, 2023
Moodle community-based vulnerability scanner

badmoodle Moodle community-based vulnerability scanner Description badmoodle is an unofficial community-based vulnerability scanner for moodle that sc

Michele Di Bonaventura 11 Dec 22, 2022
Burp Extensions

Burp Extensions This is a collection of extensions to Burp Suite that I have written. getAllParams.py - Version 1.2 This is a python extension that ru

/XNL-h4ck3r 364 Dec 30, 2022