declutters url lists for crawling/pentesting

Related tags

URL Manipulationuro
Overview

uro

Using a URL list for security testing can be painful as there are a lot of URLs that have uninteresting/duplicate content; uro aims to solve that.

It doesn't make any http requests to the URLs and removes:

  • human written content e.g. blog posts
  • urls with same path but parameter value difference
  • incremental urls e.g. /cat/1/ and /cat/2/
  • image, js, css and other static files

Usage

First, install uro with pip:

pip3 install uro

Now, there's just one way to use it, no args, no bullshit.

cat urls.txt | uro

uro-demo

Comments
  • ImportError: cannot import name 'SIGPIPE' from 'signal'

    ImportError: cannot import name 'SIGPIPE' from 'signal'

    D:\uro>uro Traceback (most recent call last): File "C:\Users\umara\AppData\Local\Programs\Python\Python38\Scripts\uro-script.py", line 33, in sys.exit(load_entry_point('uro==0.0.2', 'console_scripts', 'uro')()) File "C:\Users\umara\AppData\Local\Programs\Python\Python38\Scripts\uro-script.py", line 25, in importlib_load_entry_point return next(matches).load() File "C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\importlib\metadata.py", line 77, in load module = import_module(match.group('module')) File "C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\importlib_init_.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1014, in _gcd_import File "", line 991, in _find_and_load File "", line 975, in _find_and_load_unlocked File "", line 655, in _load_unlocked File "", line 618, in _load_backward_compatible File "", line 259, in load_module File "C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\site-packages\uro-0.0.2-py3.8.egg\uro\uro.py", line 4, in ImportError: cannot import name 'SIGPIPE' from 'signal' (C:\Users\umara\AppData\Local\Programs\Python\Python38\lib\signal.py)

    opened by umar98 3
  • Error install uro

    Error install uro

    suya has the error... WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behavior with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

    I've done the steps above but haven't found a bright spot :(

    can anyone help me???

    invalid 
    opened by mjulda 2
  • When using uro on subdomains it leaves :// in front?

    When using uro on subdomains it leaves :// in front?

    When using uro on subdomains it leaves :// in front

    example:

    cat subs.txt | uro

    subs.txt example: site.com sub.site.com sub123.site.com

    anything without http:// or https:// in front it leaves the :// in front.

    opened by gprime31 2
  • ERROR

    ERROR

    i just can't get this to work have cloned the repo and run the install command, bur when i try "cat file.txt | uro" it dosen't work. do i have to do any additional commands? any installation video??:)

    invalid 
    opened by spector012 2
  • PLease solve this

    PLease solve this

    └─# cat params.csv | uro | wc -l Traceback (most recent call last): File "/usr/local/bin/uro", line 8, in sys.exit(main()) File "/usr/local/lib/python3.9/dist-packages/uro/uro.py", line 155, in main if re.search(pattern, path): File "/usr/lib/python3.9/re.py", line 201, in search return _compile(pattern, flags).search(string) File "/usr/lib/python3.9/re.py", line 304, in _compile p = sre_compile.compile(pattern, flags) File "/usr/lib/python3.9/sre_compile.py", line 764, in compile p = sre_parse.parse(p, flags) File "/usr/lib/python3.9/sre_parse.py", line 962, in parse raise source.error("unbalanced parenthesis") re.error: unbalanced parenthesis at position 68 6547

    opened by r3dpars3c 2
  • It doesn't delete paths

    It doesn't delete paths

    When we check the paths, we see that 43935989 and 43935976 are used differently.

    [email protected]:~# cat urls.txt
    https://news.mail.ru/politics/43935976/?social=tw
    https://news.mail.ru/politics/43935989/?social=tw
    

    it should delete one of them but it doesn't.

    [email protected]:~# cat urls.txt | uro
    https://news.mail.ru/politics/43935976/?social=tw
    https://news.mail.ru/politics/43935989/?social=tw
    
    bug 
    opened by Phoenix1112 1
  • error handling

    error handling

    So I added uro to my workflow and after a while I got this error:

    Traceback (most recent call last):
      File "/usr/local/bin/uro", line 8, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.8/dist-packages/uro/uro.py", line 139, in main
        if matches_patterns(path):
      File "/usr/local/lib/python3.8/dist-packages/uro/uro.py", line 107, in matches_patterns
        if re.search(pattern, path):
      File "/usr/lib/python3.8/re.py", line 201, in search
        return _compile(pattern, flags).search(string)
      File "/usr/lib/python3.8/re.py", line 304, in _compile
        p = sre_compile.compile(pattern, flags)
      File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
        p = sre_parse.parse(p, flags)
      File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
        p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
      File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
        itemsappend(_parse(source, state, verbose, nested + 1,
      File "/usr/lib/python3.8/sre_parse.py", line 836, in _parse
        raise source.error("missing ), unterminated subpattern",
    re.error: missing ), unterminated subpattern at position 369
    

    It is happening to me with different inputs so seems to be something that happens often

    invalid 
    opened by marcelo321 1
  • Uro error

    Uro error

    λ cat newfile222.txt | uro Traceback (most recent call last): File "C:\Users\Yaseen\AppData\Local\Programs\Python\Python39\Scripts\uro-script.py", line 33, in sys.exit(load_entry_point('uro==0.0.1', 'console_scripts', 'uro')()) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\site-packages\uro\uro.py", line 139, in main if matches_patterns(path): File "c:\users\yaseen\appdata\local\programs\python\python39\lib\site-packages\uro\uro.py", line 107, in matches_patterns if re.search(pattern, path): File "c:\users\yaseen\appdata\local\programs\python\python39\lib\re.py", line 201, in search return _compile(pattern, flags).search(string) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\re.py", line 304, in _compile p = sre_compile.compile(pattern, flags) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_compile.py", line 764, in compile p = sre_parse.parse(p, flags) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_parse.py", line 948, in parse p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0) File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_parse.py", line 443, in _parse_sub itemsappend(_parse(source, state, verbose, nested + 1, File "c:\users\yaseen\appdata\local\programs\python\python39\lib\sre_parse.py", line 836, in _parse raise source.error("missing ), unterminated subpattern", re.error: missing ), unterminated subpattern at position 379 cat: write error: No space left on device

    Can you help, it saying space issue, i have alot of space

    bug invalid 
    opened by hellofresh01 1
  • Improvement Request

    Improvement Request

    Hi Somdev,

    1. I'd like to suggest you add the following extensions to be blacklisted. I have gathered all of these extensions manually and I think It would be nice to omit them:
    'svg','img','gif','mp4','flv','ogv','webm','webp','mov','mp3','m4a','m4p','ppt','pptx','pdf','scss','tif','tiff','ttf','otf','woff','woff2','eot','htc','swf','rtf','image'
    
    1. Also, I would like to ask for white-listing and allowing the js extension as there are lots of interesting features/endpoints to be found on them and I don't think if they are considered "useless".

    Thanks!

    Kind Regards, HolyBugx

    enhancement 
    opened by HolyBugx 1
  • More extension to declutter

    More extension to declutter

    Maybe it can be useful to add this extension to the one to declutter, at least, it's what I usually do:

    .doc
    .docx
    .mp3
    .mp4
    .exe
    .tif
    .ttf
    .woff
    .woff2
    .ico
    .zip
    
    duplicate 
    opened by leorac 0
  • Bad character range P-C at position 31

    Bad character range P-C at position 31

    cat urls.txt | uro

    Traceback (most recent call last):
      File "/usr/local/bin/uro", line 8, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.8/dist-packages/uro/uro.py", line 155, in main
        if re.search(pattern, path):
      File "/usr/lib/python3.8/re.py", line 201, in search
        return _compile(pattern, flags).search(string)
      File "/usr/lib/python3.8/re.py", line 304, in _compile
        p = sre_compile.compile(pattern, flags)
      File "/usr/lib/python3.8/sre_compile.py", line 764, in compile
        p = sre_parse.parse(p, flags)
      File "/usr/lib/python3.8/sre_parse.py", line 948, in parse
        p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
      File "/usr/lib/python3.8/sre_parse.py", line 443, in _parse_sub
        itemsappend(_parse(source, state, verbose, nested + 1,
      File "/usr/lib/python3.8/sre_parse.py", line 598, in _parse
        raise source.error(msg, len(this) + 1 + len(that))
    re.error: bad character range P-C at position 31
    
    bug 
    opened by remonsec 0
  • uro error

    uro error

    cat urls.txt | uro > test

    Traceback (most recent call last): File "/usr/local/bin/uro", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/uro/uro.py", line 123, in main for line in sys.stdin: File "/usr/lib/python3.10/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

    @s0md3v

    bug 
    opened by Iamsajidkhan 0
  • Error

    Error

    Traceback (most recent call last): File "/usr/local/bin/uro", line 33, in sys.exit(load_entry_point('uro==0.0.4', 'console_scripts', 'uro')()) File "/usr/local/bin/uro", line 25, in importlib_load_entry_point return next(matches).load() StopIteration

    opened by umarahmad125 0
  • broken pipe

    broken pipe

    I have been encountering this issue:

      File "/usr/local/bin/uro", line 10, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.7/dist-packages/uro/uro.py", line 151, in main
        print(host + path + dict_to_params(param))
    BrokenPipeError: [Errno 32] Broken pipe
    Traceback (most recent call last):
      File "/usr/local/bin/uro", line 10, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.7/dist-packages/uro/uro.py", line 161, in main
        print(host + path)
    BrokenPipeError: [Errno 32] Broken pipe
    

    Any idea why would it be?

    opened by marcelo321 0
  • enhanced filtration

    enhanced filtration

    like i want to filter "/A/embed?url=" or "/B/embed?url=" which return similar data like i want to filter "/A.php" or "/A.php/" which return similar data

    enhancement 
    opened by LztCode 1
Releases(0.0.4)
  • 0.0.4(Mar 19, 2022)

  • 0.0.3(Feb 27, 2022)

    • removed redundant imports and code
    • added more extensions to blacklist
    • less memory and time consumption
    • fixed 'broken pipe' error when piping the output to utilities like head
    • fixed an error where similar urls were not getting filtered when they had any parameters
    Source code(tar.gz)
    Source code(zip)
  • 0.0.2(Sep 1, 2021)

Owner
Somdev Sangwan
I make things, I break things and I make things that break things.
Somdev Sangwan
A tool to manage the base URL of the Python package index.

chpip A tool to manage the base URL of the Python package index. Installation $ pip install chpip Usage Set pip index URL Set the base URL of the Pyth

Prodesire 4 Dec 20, 2022
a little project to make custom discord invites over a url

custom-dc-invite a little project to make custom discord invites over a url how it works you create a account for

baum1810 2 Oct 03, 2022
Have you ever wondered: Where does this link go? The REDLI Tool follows the path of the URL.

Have you ever wondered: Where does this link go? The REDLI Tool follows the path of the URL. It allows you to see the complete path a redirected URL goes through. It will show you the full redirectio

JAYAKUMAR 28 Sep 11, 2022
Use this module to detect if a URL is on discord's phishing list.

PhishDetector This module was made so you can check a URL and see if it's in discord's official list of phishing and suspicious URLs. Installation pip

Elijah 4 Mar 25, 2022
Simple python library to deal with URI Templates.

uritemplate Documentation -- GitHub -- Travis-CI Simple python library to deal with URI Templates. The API looks like from uritemplate import URITempl

Hyper 210 Dec 19, 2022
Qysqa - URL shortener website with python

Qysqa - shorten your URL. ~ A simple URL-shortening website. how do you pronounc

Dastan Ozgeldi 0 Nov 18, 2022
C++ library for urlencode.

liburlencode C library for urlencode.

Khaidi Chu 6 Oct 31, 2022
declutters url lists for crawling/pentesting

uro Using a URL list for security testing can be painful as there are a lot of URLs that have uninteresting/duplicate content; uro aims to solve that.

Somdev Sangwan 677 Jan 07, 2023
encurtador de links feito com python

curt-link encurtador de links feito com python! instalação Linux: $ git clone https://github.com/bydeathlxncer/curt-link $ cd curt-link $ python3 url.

bydeathlxncer 5 Dec 29, 2021
This is a no-bullshit file hosting and URL shortening service that also runs 0x0.st. Use with uWSGI.

This is a no-bullshit file hosting and URL shortening service that also runs 0x0.st. Use with uWSGI.

mia 1.6k Dec 31, 2022
A simple URL shortener app using Python AWS Chalice, AWS Lambda and AWS Dynamodb.

url-shortener-chalice A simple URL shortener app using AWS Chalice. Please make sure you configure your AWS credentials using AWS CLI before starting

Ranadeep Ghosh 2 Dec 09, 2022
python3 flask based python-url-shortener microservice.

python-url-shortener This repository is for managing all public/private entity specific api endpoints for an organisation. In this case we have entity

Asutosh Parida 1 Oct 18, 2021
A url shortner written in Flask.

url-shortener-elitmus This is a simple flask app which takes an URL and shortens it. This shortened verion of the URL redirects to the user to the lon

2 Nov 23, 2021
Ukiyo - A simple, minimalist and efficient discord vanity URL sniper

Ukiyo - a simple, minimalist and efficient discord vanity URL sniper. Ukiyo is easy to use, has a very visually pleasing interface, and has great spee

13 Apr 14, 2022
Fast pattern fetcher, Takes a URLs list and outputs the URLs which contains the parameters according to the specified pattern.

Fast Pattern Fetcher (fpf) Coded with 3 by HS Devansh Raghav Fast Pattern Fetcher, Takes a URLs list and outputs the URLs which contains the paramete

whoami security 5 Feb 20, 2022
a url shortener project from semicolonworld

Url Shortener With Django Written by Semicolon World

3 Aug 24, 2021
:electric_plug: Generating short urls with python has never been easier

pyshorteners A simple URL shortening API wrapper Python library. Installing pip install pyshorteners Documentation https://pyshorteners.readthedocs.i

Ellison 350 Dec 24, 2022
Astra is a tool to find URLs and secrets.

Astra finds urls, endpoints, aws buckets, api keys, tokens, etc from a given url/s. It combines the paths and endpoints with the given domain and give

Stinger 198 Dec 27, 2022
Customizable URL shortener written in Python3 for sniffing and spoofing

Customizable URL shortener written in Python3 for sniffing and spoofing

3 Nov 22, 2022
A teeny Tiny module to check URLs against discord's list of phishing domains

A teeny Tiny module to check URLs against discord's list of phishing domains

kaj 1 Aug 29, 2022