Unja is a fast & light tool for fetching known URLs from Wayback Machine

Last update: Aug 07, 2022

Related tags

Overview

Unja

Fetch Known Urls

What's Unja?

Unja is a fast & light tool for fetching known URLs from Wayback Machine, Common Crawl, Virus Total & AlienVault's Otx it uses a separate thread for each provider to optimize its speed and use Wayback resumption key to divide scan into multiple parts to handle a large scan & it uses direct filters on API to get only filtered data from API to do less work on your system.

Why Unja?

Supports Wayback/Common-Crawl/Virus-Total/Otx
Automatically handles rate limits and timeouts
Export results: text or detailed output with status,mime,length in JSON
MultiThreading: separate thread for each provider to fetch data simultaneously
Filters: apply filters dirtly on provider to avoid unnecessary data

Installing Unja

You can install Unja with pip as following:

pip3 install unja

or, by downloading this repository and running

python3 setup.py install

Updating Unja

You can update Unja with pip as following:

pip3 install unja -U

Usage

unja -h

This will display help for the tool.

Flag	Description	Example
-d	doimain	unja -d ninjhacks.com
--sub	Include subdomain	unja --sub
-p	Providers (wayback commoncrawl otx virustotal)	unja -p wayback
--wbf	(default : statuscode:200 ~mimetype:html)	ninjref --filter statuscode:200
--ccf	(default : =status:200 ~mime:.*html)	ninjref --filter =status:200
--wbl	Wayback results per request (default : 10000)	unja --wbl 1000
--otxl	Otx results per request (default : 500)	unja --otxl 500
-r	Amount of retries for http client (default : 3)	nnja -r 3
-v	Enable verbose mode to show errors	nnja -v
-j	Enable json mode for detailed output in json format	nnja -j
-s	Silent mode don't print header	nnja -s
--ucci	Update CommonCrawl Index	nnja --ucci
--vtkey	Change VirusTotal Api in config	nnja --vtkey

Output Methods

text = ( default ) Output urls only.

json = ( -j ) Output url,status,mime,length in json format it's can help you later filtering result based on those variables.

Filters

Filters directly apply on providers to get only useful filtered data from provider.

Wayback	Commoncrawl	Description
statuscode:200	=status:200	return only those urls which status code is 200
!statuscode:200	!=status:200	return only non 200 status code
mimetype:text/html	mime:text/html	return only those url which response type is text/html
!mimetype:text/html	!=mime:text/html	return only non text/html response type
~mimetype:html	~mime:.*html	return all those url which have html word in response type
~original:unja	~url:.*unja	return all those url which have unja word in url

Oneliners

Get only urls with parameters & status code 200

unja -s -d target.com --sub -p wayback commoncrawl --wbf 'statuscode:200 ~original:=' --ccf '=status:200 ~url:.*=' | anew | tee output

Looking for open redirects

unja -s -d target.com --sub -p wayback commoncrawl --wbf '~statuscode:30 ~original:=http' --ccf '~status:30 ~url:.*=http' | anew | tee output

Clean result ( Exclude images,css,javascripts,woff & 404)

unja -s -d target.com --sub -p wayback commoncrawl --wbf '!statuscode:404 ~!mimetype:image ~!mimetype:javascript ~!mimetype:css ~!mimetype:woff' --ccf '!=status:404 !~mime:.*image !~mime:.*javascript !~mime:.*css !~mime:.*woff' | anew | tee output

Let me know if you have any other good oneliner ./

You might also like...

Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

trafilatura: Web scraping tool for text discovery and retrieval Description Trafilatura is a Python package and command-line tool which seamlessly dow

704 Jan 6, 2023

Tool to scan for secret files on HTTP servers

snallygaster Finds file leaks and other security problems on HTTP servers. what? snallygaster is a tool that looks for files accessible on web servers

2k Dec 28, 2022

Goblyn is a Python tool focused to enumeration and capture of website files metadata.

Goblyn Metadata Enumeration What's Goblyn? Goblyn is a tool focused to enumeration and capture of website files metadata. How it works? Goblyn will se

46 Nov 22, 2022

A low-code tool that generates python crawler code based on curl or url

KKBA Intruoduction A low-code tool that generates python crawler code based on curl or url Requirement Python = 3.6 Install pip install kkba Usage Co

8 Sep 20, 2021

Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.

543 Jan 3, 2023

A tool to easily scrape youtube data using the Google API

YouTube data scraper To easily scrape any data from the youtube homepage, a youtube channel/user, search results, playlists, and a single video itself

7 Dec 3, 2022

A tool for scraping and organizing data from NewsBank API searches

nbscraper Overview This simple tool automates the process of copying, pasting, and organizing data from NewsBank API searches. Curerntly, nbscrape onl

0 Jun 17, 2021

👁️ Tool for Data Extraction and Web Requests.

httpmapper 👁️ Project • Technologies • Installation • How it works • License Project 🚧 For educational purposes. This is a project that I developed,

15 Dec 5, 2021

This tool can be used to extract information from any website

WEB-INFO- This tool can be used to extract information from any website Install Termux and run the command --- $ apt-get update $ apt-get upgrade $ pk

1 Oct 24, 2021

Releases(v0.0.7)

v0.0.7(Aug 1, 2022)

Full Changelog: https://github.com/ninjhacks/unja/compare/v0.0.5...v0.0.7
Source code(tar.gz)
Source code(zip)
v0.0.6(Jun 18, 2022)

New Provider urlscan.io Added List of Urls Support Added Bug Fix Full Changelog: https://github.com/ninjhacks/unja/compare/v0.0.4...v0.0.6
Source code(tar.gz)
Source code(zip)
v0.0.4(Mar 18, 2022)

BugFix:- IndexError: list index out of range

Full Changelog: https://github.com/ninjhacks/unja/compare/v0.0.3...v0.0.4
Source code(tar.gz)
Source code(zip)
v0.0.3(Jan 5, 2022)

Full Changelog: https://github.com/ninjhacks/unja/compare/v0.0.2...v0.0.3
Source code(tar.gz)
Source code(zip)
v0.0.2(Jan 5, 2022)

Full Changelog: https://github.com/ninjhacks/unja/compare/v0.0.1...v0.0.2
Source code(tar.gz)
Source code(zip)
v0.0.1(Jan 5, 2022)

First Release
Source code(tar.gz)
Source code(zip)

Unja is a fast & light tool for fetching known URLs from Wayback Machine

Related tags

Overview

Unja

Fetch Known Urls

What's Unja?

Why Unja?

Installing Unja

Updating Unja

Usage

Output Methods

Filters

Oneliners

You might also like...

Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

Tool to scan for secret files on HTTP servers

Goblyn is a Python tool focused to enumeration and capture of website files metadata.

A low-code tool that generates python crawler code based on curl or url

Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.

A tool to easily scrape youtube data using the Google API

A tool for scraping and organizing data from NewsBank API searches

👁️ Tool for Data Extraction and Web Requests.

This tool can be used to extract information from any website

Releases(v0.0.7)

v0.0.7(Aug 1, 2022)

v0.0.6(Jun 18, 2022)

v0.0.4(Mar 18, 2022)

v0.0.3(Jan 5, 2022)

v0.0.2(Jan 5, 2022)

v0.0.1(Jan 5, 2022)

Owner

Sheryar

A low-code tool that generates python crawler code based on curl or url

A command-line program to download media, like and unlike posts, and more from creators on OnlyFans.

A scrapy pipeline that provides an easy way to store files and images using various folder structures.

Simple library for exploring/scraping the web or testing a website you’re developing

download NCERT books using scrapy

a high-performance, lightweight and human friendly serving engine for scrapy

茅台抢购最新优化版本，茅台秒杀，优化了抢购协程队列

This program will help you to properly scrape all data from a specific website

Web scraper for Zillow

Screen scraping and web crawling framework

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

Create crawler get some new products with maximum discount in banimode website

京东秒杀商品抢购Python脚本

一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件

Automated data scraper for Thailand COVID-19 data

Scrap the 42 Intranet's elearning videos in a single click

Scrape puzzle scrambles from csTimer.net

Complete pipeline for crawling online newspaper article.

This tool can be used to extract information from any website

PS5 bot to find a console in france for chrismas 🎄🎅🏻 NOT FOR SCALPERS