Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Last update: Nov 05, 2021

Related tags

Overview

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

This repository provides two web crawlers to label domain names using the McAfee API (https://www.trustedsource.org/sources/index.pl) and IP reputation using the TALOS API (https://talosintelligence.com/), respectively.

Requirements

BeautifulSoup

Usage

Descriptions of the demonstration code are as follows.

To label the categories of a set of domains, put the domain list in 'data/domain_list.txt' and run 'demo_domain_label.py'. The program will label the (1) category (e.g., Malicious Sites- Parked Domain) as well as (2) risk level (e.g., High Risk) of each domain (using the McAfee API) and save the results in 'res/domain_labels.txt'. When the program continuously outputs ''-Retry-'', please stop the program and wait for a moment. After the waiting, you can start the program again, which can automatically skip the domains already labeled and continue to label the rest domains.
To label the reputation of a set of IP addresses, put the IP list in 'data/IP_list.txt' and run 'demo_IP_label.py'. The program will label the (1) email reputation as well as (2) web reputation (with 3 levels of Poor, Neutral, and Good) and save the results in 'res/IP_labels.txt'. When the program continuously outputs ''None'', please stop the program and wait for a moment. After the waiting, you can start the program again, which can automatically skip the IPs already labeled and continue to label the rest IPs.
An example domain name list (with 21,820 effective second-level domains) and an example IP list (with 67,751 IP addresses) are given in 'data/examples/example_domain_list.txt' and 'data/examples/example_IP_list.txt', repsectively. The corresponding labeled results are saved in 'res/examples/example_domain_labels.txt' and 'res/examples/example_IP_labels.txt', respectively.

If you have questions regarding this repository, you can contact the author via [[email protected]].

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Related tags

Overview

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Requirements

Usage

Owner

Scrape and display grades onto the console

Scrapy, a fast high-level web crawling & scraping framework for Python.

Nekopoi scraper using python3

A powerful annex BUBT, BUBT Soft, and BUBT website scraping script.

Twitter Eye is a Twitter Information Gathering Tool With Twitter Eye

A web scraper that exports your entire WhatsApp chat history.

UsernameScraperTool - Username Scraper Tool With Python

Web3 Pancakeswap Sniper bot written in python3

Meme-videos - Scrapes memes and turn them into a video compilations

学习强国自动化百分百正确、瞬间答题，分值45分

A database scraper created with mechanical soup and sqlite

A web Scraper for CSrankings.com that scrapes University and Faculty list for a particular country

Grab the changelog from releases on Github

Scrapegoat is a python library that can be used to scrape the websites from internet based on the relevance of the given topic irrespective of language using Natural Language Processing

LSpider 一个为被动扫描器定制的前端爬虫

Kusonime scraper using python3

🕷 Phone Crawler with multi-thread functionality

A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

Dailyiptvlist.com Scraper With Python

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Related tags

Overview

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Requirements

Usage

Owner

Scrape and display grades onto the console

Scrapy, a fast high-level web crawling & scraping framework for Python.

Nekopoi scraper using python3

A powerful annex BUBT, BUBT Soft, and BUBT website scraping script.

Twitter Eye is a Twitter Information Gathering Tool With Twitter Eye

A web scraper that exports your entire WhatsApp chat history.

UsernameScraperTool - Username Scraper Tool With Python

Web3 Pancakeswap Sniper bot written in python3

Meme-videos - Scrapes memes and turn them into a video compilations

学习强国 自动化 百分百正确、瞬间答题，分值45分

A database scraper created with mechanical soup and sqlite

A web Scraper for CSrankings.com that scrapes University and Faculty list for a particular country

Grab the changelog from releases on Github

Scrapegoat is a python library that can be used to scrape the websites from internet based on the relevance of the given topic irrespective of language using Natural Language Processing

LSpider 一个为被动扫描器定制的前端爬虫

Kusonime scraper using python3

🕷 Phone Crawler with multi-thread functionality

A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

Dailyiptvlist.com Scraper With Python

Python script to check if there is any differences in responses of an application when the request comes from a search engine's crawler.

学习强国自动化百分百正确、瞬间答题，分值45分