Python framework to scrape Pastebin pastes and analyze them

Overview

Logo

pastepwn - Paste-Scraping Python Framework

Build Status PyPI version Coverage Status Codacy Badge

Pastebin is a very helpful tool to store or rather share ascii encoded data online. In the world of OSINT, pastebin is being used by researchers all around the world to retrieve e.g. leaked account data, in order to find indicators about security breaches.

Pastepwn is a framework to scrape pastes and scan them for certain indicators. There are several analyzers and actions to be used out-of-the-box, but it is also easily extensible - you can create your own analyzers and actions on the fly.

Please note: This framework is not to be used for illegal actions. It can be used for querying public Pastebin pastes for e.g. your username or email address in order to increase your own security.

⚠️ Important note

In April 2020 Pastebin disabled access to their scraping API for a short period of time. At first people weren't able to access the scraping API in any way, but later on they re-enabled access to the API setup page. But since then it isn't possible to scrape "text" pastes. Only pastes with any kind of syntax set. That reduces the amount of pastes to a minimum, which reduced the usefulness of this tool.

Setting up pastepwn

To use the pastepwn framework you need to follow these simple steps:

  1. Make sure to have a Pastebin premium account!
  2. Install pastepwn via pip (pip3 install pastepwn
  3. Create a file (e.g. main.py) in your project root, where you put your code in²
  4. Fill that file with content - add analyzers and actions. Check the example implementation.

¹ Note that pastepwn only works with python3.6 or above
² (If you want to store all pastes, make sure to set up a mongodb, mysql or sqlite instance)

Behind a proxy

There are 2 ways to use this tool behind a proxy:

  • Define the following environment variables: HTTP_PROXY, HTTPS_PROXY, NO_PROXY.
  • When initializing the PastePwn object, use the proxies argument. proxies is a dict as defined in requests' documentation.

Troubleshooting

If you are having troubles, check out the wiki pages first. If your question/issue is not resolved there, feel free to create an issue or contact me on Telegram.

Roadmap and ToDos

Check the bug tracker on GitHub to get an up-to-date status about features and ToDos.

  • REST API for querying paste data (will be another project)
  • Add a helpful wiki with instructions and examples
Comments
  • pastepwn.core.actionhandler - ERROR - While performing the action 'EmailAction' the following exception occurred: ''Paste' object has no attribute 'template''

    pastepwn.core.actionhandler - ERROR - While performing the action 'EmailAction' the following exception occurred: ''Paste' object has no attribute 'template''

    Hi,

    Im getting this error while running main.py

    pastepwn.core.actionhandler - ERROR - While performing the action 'EmailAction' the following exception occurred: ''Paste' object has no attribute 'template''

    i tried to swicth from telegram to email action in main.py

    # -*- coding: utf-8 -*-
    
    import logging.handlers
    import os
    
    from pastepwn import PastePwn
    from pastepwn.actions import EmailAction
    from pastepwn.analyzers import MailAnalyzer, WordAnalyzer
    from pastepwn.database import MysqlDB
    
    # Generic action to send Telegram messages on new matched pastes
    telegram_action = EmailAction
    mail_analyzer = MailAnalyzer(telegram_action)
    pastepwn.add_analyzer(mail_analyzer)
    

    Please help me out as im loving the logging feature

    bug 
    opened by zespirit 17
  • Create docker-compose.yml file

    Create docker-compose.yml file

    Currently there is no (working) docker image available. The goal is to have a docker image + a docker-compose file which automatically starts a mongodb & pastepwn or a mysql & pastepwn or sqlite & pastepwn.

    For that we need to read environment variables.

    help wanted good first issue hacktoberfest Difficulty: Easy 
    opened by d-Rickyy-b 11
  • Add Discord action (webhook or bot token)

    Add Discord action (webhook or bot token)

    This includes commits from #85 which hasn't been merged yet. Fixes #83 and expands on #85 by enabling users to have a DiscordAction with a custom bot token and channel ID, instead of just a webhook.

    Both ways work just fine on my machine, but I haven't tried using fresh tokens, let me know if you encounter authentication issues.

    opened by Zeroji 9
  • Wrapper around various hash analyzers

    Wrapper around various hash analyzers

    Users might not want to calculate their password hashes beforehand and thus might want to be able to use a wrapper (analyzer) around various hash analyzers. They initialize that wrapper with their password in the clear. The wrapper generates the hashes on the fly and checks each paste against those hashes.

    I am not sure if that's a good idea but I'll leave this here for now.

    good first issue New feature hacktoberfest Difficulty: Easy 
    opened by d-Rickyy-b 9
  • Create contributing guidelines

    Create contributing guidelines

    The guidelines should describe how you can contribute to this project.

    References: https://www.conventionalcommits.org/en/v1.0.0-beta.4/ https://keepachangelog.com/en/1.0.0/ https://semver.org/

    enhancement 
    opened by d-Rickyy-b 7
  • Current travis config leads to (up to) 4 coverage reports on coveralls per file

    Current travis config leads to (up to) 4 coverage reports on coveralls per file

    This is due to the fact that the travis config defines 4 python versions to run on. https://github.com/d-Rickyy-b/pastepwn/blob/f9bd202d0813aebb6bc3f189a43158227ca2bdea/.travis.yml#L3-L7

    grafik

    This could be fixed with to a build matrix which is adding the after_success command only on one python version.

    https://github.com/d-Rickyy-b/pastepwn/blob/f9bd202d0813aebb6bc3f189a43158227ca2bdea/.travis.yml#L18-L19

    bug good first issue hacktoberfest Difficulty: Easy 
    opened by d-Rickyy-b 6
  • Create unit test for PastebinURLAnalyzer

    Create unit test for PastebinURLAnalyzer

    The current implementation of the PastebinURLAnalyzer did not get any unit tests yet. This should change now.

    To resolve this task you need to create a new unit test file in the tests directory named pastebinurlanalyzer_test.py.

    In this file implement at least 5 positive and 5 negative tests. The more the better! Make sure to test tricky combinations. Unit tests should really be testing code to its limits.

    PS: It's okay when tests make the build fail because the code is faulty! (It's not okay if tests fail for no valid reason)

    good first issue hacktoberfest Difficulty: Easy unit test 
    opened by d-Rickyy-b 6
  • On start method

    On start method

    There are users which would like to perform an action when pastepwn is fully initialized and running. There should be a way to register a handler for that.

    good first issue New feature hacktoberfest Difficulty: Easy 
    opened by d-Rickyy-b 6
  • RegexAnalyzers should return what they find, instead of just indicating whether or not they found it.

    RegexAnalyzers should return what they find, instead of just indicating whether or not they found it.

    I'm not sure whether this idea is compatible with your current idea of how this tool should be used, but it strikes me as odd that the RegexAnalyzers simply report whether or not a match was found, rather than returning all the data they were able to match.

    For the PastebinURLAnalyzer I just made a pull request for, for example, I imagine it might be useful if you could feed it a number of pastes, and it could create some sort of dictionary which mapped the paste it had found a match on with a list of all URLs it was able to find.

    That way, I could say "check out these 200 pastes and show me all the emails, pastebin urls, and bcrypt password hashes you find."

    Just an idea, of course. Would probably require a bit of a redesign of the way the analyzers are used, but would require minimal changes to the actual analyzers.

    enhancement help wanted Difficulty: Medium 
    opened by lemonsaurus 6
  • Please Add the pastebin title in the mail action to identify the kind of paste

    Please Add the pastebin title in the mail action to identify the kind of paste

    instead of Paste matched by pastepwn via analyzer "EmailPasswordPairAnalyzer"

    would it be possible to add the pastebin title Paste matched by pastepwn via analyzer "EmailPasswordPairAnalyzer + spotify"

    question 
    opened by zespirit 5
  • New Analyzer: Base64AsciiAnalyzer

    New Analyzer: Base64AsciiAnalyzer

    Implement a new analyzer that subclasses Base64Analyzer and only returns True if the found base64 decodes to valid ascii characters.

    We can overwrite the verify method and call super().verify(results) before doing our base64 ascii decoding.

    New Analyzer hacktoberfest Difficulty: Easy 
    opened by d-Rickyy-b 5
  • chore(deps): update websockets requirement from <10,>=9.1 to >=9.1,<11

    chore(deps): update websockets requirement from <10,>=9.1 to >=9.1,<11

    Updates the requirements on websockets to permit the latest version.

    Commits
    • 13eff12 Bump version number.
    • a04bfdb Add changelog.
    • be1203b Add API documentation for the Sans-I/O layer.
    • 724408e Add Sans-I/O howto guide.
    • 0cf8441 Remove unnecessary parameters from reject().
    • a8eb973 Avoid creating doc attributes.
    • 5fc6fa8 Clarify comment.
    • eba7b56 Improve docs of Frame and Close.
    • abd297b Expect a WebSocketURI in ClientConnection.
    • 4a22bdf Make websockets.uri a public API (again!)
    • Additional commits viewable in compare view

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Order analyzer execution in MergedAnalyzer by performance score

    Order analyzer execution in MergedAnalyzer by performance score

    Currently the execution order of analyzers in MergedAnalyzers depends on how they were joined.

    A & B will run A first, then B.

    It makes much sense to have some sort of performance score that will define the execution order. Lower scores mean faster execution.

    The only issue I can currently think of: MergedAnalyzers need a performace score themselves. What would that be?

    Could the performance score maybe calculated on the fly? Like how long the analyzer took? Thoughts...

    enhancement Difficulty: Hard 
    opened by d-Rickyy-b 0
  • Implement multiple example files

    Implement multiple example files

    Currently there is only one example file. Users might not fully understand the capability of the tool and might get scared, because it involves setting up a database.

    At least one other example file without the need of a database and maybe one with the usage of sqlite could help them.

    enhancement help wanted good first issue hacktoberfest Difficulty: Easy 
    opened by d-Rickyy-b 6
  • 'utf-8' codec can't decode byte 0x?? in position ?: invalid continuation byte

    'utf-8' codec can't decode byte 0x?? in position ?: invalid continuation byte

    When pastes contain non utf-8 characters, the decoding fails and downloading the paste is being stopped.

    Errors logged at: https://github.com/d-Rickyy-b/pastepwn/blob/1d9b82efa53d948f790b663a54d609150e65b32e/pastepwn/scraping/pastebin/pastebinscraper.py#L96-L100

    and: https://github.com/d-Rickyy-b/pastepwn/blob/1d9b82efa53d948f790b663a54d609150e65b32e/pastepwn/scraping/pastebin/pastebinscraper.py#L123-L140

    Example pastes:

    2020-02-04 01:17:22,211 - pastepwn.scraping.pastebin.pastebinscraper - ERROR - 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
    2020-02-04 01:17:22,213 - pastepwn.scraping.pastebin.pastebinscraper - ERROR - An exception occurred while downloading the paste 'nDPF9r5b'. Skipping this paste! Exception is: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
    2020-02-04 01:19:19,130 - pastepwn.scraping.pastebin.pastebinscraper - ERROR - 'utf-8' codec can't decode byte 0xe1 in position 2262: invalid continuation byte
    2020-02-04 01:19:19,132 - pastepwn.scraping.pastebin.pastebinscraper - ERROR - An exception occurred while downloading the paste 'aeC9BS25'. Skipping this paste! Exception is: 'utf-8' codec can't decode byte 0xe1 in position 2262: invalid continuation byte
    2020-02-04 08:17:52,570 - pastepwn.scraping.pastebin.pastebinscraper - ERROR - 'utf-8' codec can't decode byte 0xe1 in position 2262: invalid continuation byte
    2020-02-04 08:17:52,636 - pastepwn.scraping.pastebin.pastebinscraper - ERROR - An exception occurred while downloading the paste '0Cq4CYCH'. Skipping this paste! Exception is: 'utf-8' codec can't decode byte 0xe1 in position 2262: invalid continuation byte
    
    
    bug Difficulty: Medium 
    opened by d-Rickyy-b 1
  • Implement new Scraper for GitHub Events

    Implement new Scraper for GitHub Events

    Similar to shhgit (repo link) there could be a new parser which clones a repo and checks files with the given analyzers.

    For now this is just a random idea with close to no detailled thoughts on how to implement this. There is the GitHub Events API which is also used by shhgit. Maybe also the source code of shhgit can be used to implement some of the code for pastepwn.

    Definition of done

    1. A new directory called 'github' was created in the scraping directory
    2. A new scraper (which is expanding basicscraper) is implemented in the github directory
    3. The new scraper works similar to the pastebin scraper and fetches events from the github events API. Currently it seems that it needs to clone the repo before acting on it. You are free to make suggestions how this should work.
    enhancement hacktoberfest Difficulty: Medium 
    opened by d-Rickyy-b 4
Releases(v2.0.0)
  • v2.0.0(Aug 15, 2021)

    [2.0.0] - 2021-07-13

    Fixed

    • Better protocol check in urlanalyzer (a377aee)
    • Use sys.exit() instead of exit() (2d6cb67)
    • Add missing parameter 'unique_matches' for match function (c9a2e99)

    Added

    • Implemented ExactWordAnalyzer to match words exactly rather than partially (08ebdbc)
    • Implemented BasicAnalyzer.unique(), which can be used to filter out duplicate matches from BasicAnalyzer.match()
    • Ability to enforce IP version for the connection to pastepwn (3483566)

    Changed

    • BREAKING: Dropped support for Python < 3.6
    Source code(tar.gz)
    Source code(zip)
  • v1.3.1(Jun 20, 2020)

    You need to whitelist your IP when you want to use the scraping API. The previous release did not properly tell you, when your IP wasn't whitelisted and you were using IPv6. This is now fixed.

    1.3.1 - 2020-06-20

    Fixed

    • The PastebinScraper could not recognize error messages with IPv6 addresses.
    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(Mar 4, 2020)

    This release focusses mostly on fixing bugs, but it also contains a new analyzer implementation and a rewrite of the IrcAction, which did not work previously.

    1.3.0 - 2020-03-03

    Added

    • Implemented base64analyzer, which matches if a found base64 string decodes to valid ascii (b535781)
    • Implemented IrcAction - the previous implementation was not working (546b87f)

    Changed

    • SaveFileAction now got a parameter to set the file ending and a template (c3d75f7)

    Fixed

    • Analyzers now check if a passed action is a subclass of BasicAction, which prevents issues such as #175
    • The DiscordAction now also uses the templating engine - it was forgotten in a previous update (#176)
    • The SyslogAction now also uses the templating engine - it was forgotten in a previous update (54d3652)
    • The SaveFileAction does now store each paste in a different file as it should be (#179)
    • The IrcAction did not send the correct data. This was fixed and eventually the action was rewritten from scratch (see "Added")
    Source code(tar.gz)
    Source code(zip)
  • v1.2.0(Feb 4, 2020)

    In this release I implemented a few new features, which require some wiki pages for better explanation. Here's a general list of new features:

    1.2.0 - 2020-02-05

    Added

    • Analyzers can now return a boolean or a list of matched results
    • Actions now get passed a list of matched results by the analyzer
    • New Analyzer: PasteTitleAnalyzer - Analyzer to match Paste titles via regex
    • New Analyzer: IPv4AddressAnalyzer - Match IPv4 addresses via regex
    • Subclasses of RegexAnalyzer now got a method def verify(results) that can be overwritten to filter matches so you only return valid results
    • EmailPasswordPairAnalyzer has now an optional parameter min_amount to specify how many pairs must be found to actually match
    • Base64Analyzer got an optional parameter min_len to specify how long a detected string must be at least to actually match
    • Logical operators for analyzers - you can now connect multiple analyzers with logical operators to specify more precisely when a paste should match (aed2dbf)

    Changed

    • Analyzers can now return a boolean or a list of matched results
    • Actions now get passed a list of matched results by the analyzer and can
    • IBANAnalyzer will now filter out wrong IBANs and return a list of validated IBANs if the validate parameter is set to True

    Fixed

    • Using non-capturing groups in regex for various analyzers. This is done so that the analyzer can return a matched string and at the same time it fixed some issues with analyzers not matching properly

    This changelog was created with the help of Keep a Changelog

    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Nov 11, 2019)

    During hacktoberfest 2019 there has been a huge interest in this project and people implemented tests, actions, analyzers and other stuff. I think that this all is worth a new minor release.

    1.1.0 - 2019-11-11

    Added

    • Implement TemplatingEngine for filling template strings with content (8481036)
    • Add custom request headers in request.py (5043e0c)
    • Add flags to RegexAnalyzer to handle e.g. case insensitive matching (ddd0dca)
    • logger object now usable from within any analyzer (d21532e)
    • Implement logical analyzers (and/or) (94fc691)
    • Implement listify method to create lists from a given input (e935122)
    • Implement support for onstart handlers (25b5313)
    • Create docker-compose file (83014be)
    • New Action: TwitterAction for posting tweets when a paste matched (2056c3c)
    • New Action: DiscordAction (eafdc1c)
    • New Action: MISPAction (8dabe5d)
    • New Action: EmailAction (9cfba96)
    • New Action: IrcAction (fc1d1ab)
    • New Analyzer: PrivateKeyAnalyzer (a8746f1)
    • New Analyzer: DatabaseDumpAnalyzer (0aa63ad)
    • New Analyzer: DBConnAnalyzer (e940630)
    • New Analyzer: PhoneNumberAnalyzer (9ff58b9)
    • New Analyzer: OriginKeyAnalyzer (d0d715d)
    • New Analyzer: SteamKeyAnalyzer (27273a6)
    • New Analyzer: UplayKeyAnalyzer (38097ac)
    • New Analyzer: EpicKeyAnalyzer (da122da)
    • New Analyzer: BattleNetKeyAnalyzer (8927204)
    • New Analyzer: MicrosoftKeyAnalyzer (8927204)
    • New Analyzer: AWSAccessKeyAnalyzer (ebc6eab)
    • New Analyzer: AWSSecretKeyAnalyzer (d07021a)
    • New Analyzer: SlackWebhookAnalyzer (c40c364)
    • New Analyzer: GoogleOAuthKeyAnalyzer (fbfb8bf)
    • New Analyzer: FacebookAccessTokenAnalyzer (bb51e3e)
    • New Analyzer: Base64Analyzer (8d50fbe)
    • New Analyzer: AdobeKeyAnalyzer (4e52345)
    • New Analyzer: EmailPasswordPairAnalyzer (f0af9cb)
    • New Analyzer: HashAnalyzer (87080c2)
    • New Analyzer: SlackTokenAnalyzer (d686169)
    • New Analyzer: MailChimpApiKeyAnalyzer (2e5302d)
    • New Analyzer: MegaLinkAnalyzer (c884cb6)
    • New Analyzer: StripeApiKeyAnalyzer (f9bd202)
    • New Analyzer: AzureSubscriptionKeyAnalyzer (b010cb5)
    • New Analyzer: GoogleApiKeyAnalyzer (635a5e4)

    Changed

    • Add pastebinscraper by default (d00fc83)
    • Remove unused custom_payload from DiscordAction (7b13d75)

    Fixed

    • SHA hash analyzer can now accept multiple length hashes (494d1af)
    • Use empty string if paste.body is set to None in URL- and IBANAnalyzer (09f6763)
    • Include some changes when creating a sqlite file (0eb3504)
    Source code(tar.gz)
    Source code(zip)
  • v1.0.16(Sep 7, 2019)

    Before this update it was not possible to re-download pastes which were not ready for scraping yet. Also deleted posts were stored in the database and analyzed. This is fixed now!

    1.0.16 - 2019-09-08

    Added

    • Perform checks on pastebin responses to catch errors (01f865e)
    • If pastes are not ready for downloading, requeue them (01f865e)
    Source code(tar.gz)
    Source code(zip)
  • v1.0.15(Sep 4, 2019)

    I finally came to checking my dev branch. I had some stuff up the pipeline and finally merged it. Have fun!

    1.0.15 - 2019-09-04

    Added

    • Ability to search for multiple words in single WordAnalyzer (d2a7e09)
    • Ability to restart running scrapers after adding a new one (de99892)
    • Ability to register error handlers (1fae47e)

    Fixed

    • Check if paste is None before analyzing it (2fd7b39, f4bfa46)
    • Broken behaviour for WordAnalyzer blacklist (df2dd5b)
    • Reduced sleep time in order to shut down pastepwn faster (55bb18d)
    • Add check in GenericAnalyzer if parameter is callable (781d6d0)
    • WordAnalyzer not matching in case sensitive mode (8762ddd)
    Source code(tar.gz)
    Source code(zip)
  • v1.0.14(Sep 3, 2019)

    In this release I fixed some database (sqlite) related issues.

    1.0.14 - 2019-09-04

    Added

    • Parameter for setting up storing of all downloaded pastes (e04f476)

    Fixed

    • Broken path to requirements.txt in setup.py (cc7edf4)
    • Missing column 'syntax' in sqlite table layout (3fb3821)
    • Broken variable substitution for sqlite statement (cf49963)
    • Allow writing to sqlite db from multiple threads (f47ec62)
    Source code(tar.gz)
    Source code(zip)
  • v1.0.13(Sep 2, 2019)

    In the previous release I had a few broken/unfinished files in the code and a requirement for sanic (which I wanted to build the API with). I removed the requirement for it. Now the python version required should not be >3.5 anymore.

    1.0.13 - 2019-09-02

    Added

    • Pastepwn got a logo! (57e6665)
    • Use travis tag when building pypi package (bda3c7e)

    Fixed

    • Broken paths in setup.py (42eca9b)
    Source code(tar.gz)
    Source code(zip)
  • v1.0.12(Feb 20, 2019)

    This release allows multiple actions to be executed after only one analyzer matched.

    1.0.12 - 2019-02-20

    Added

    • New add_action(self, action) method in BasicAnalyzer to add actions on the fly (4b5df12)
    • Created a Dockerfile (b5334ff)
    • Implement possibility to execute multiple actions when a paste matches (ae6055e)
    • Method to create database on a mysql server (dbfecce)
    • Stop method for pastedispatcher
    • Stop method in actionhandler ()

    Changed

    • Minimum supported Python version is now 3.5, because that's what we run travis on (7b8bae2)

    Fixed

    • Use better sqlite create table statement (9378dad)
    • MySQL Port setting did not work (d498088)
    • Wrong MySQL syntax in create statements (6ae6508)
    Source code(tar.gz)
    Source code(zip)
  • v1.0.11(Jan 9, 2019)

  • v1.0.10(Jan 6, 2019)

  • v1.0.9-travis(Oct 25, 2018)

  • v1.0.8-travis(Oct 25, 2018)

  • v1.0.8(Oct 22, 2018)

Owner
Rico
Hi there :) I'm Rico (or Rickyy) from Germany. You can text me on Telegram if you have any questions. My heart belongs to open source software.
Rico
联通手机营业厅自动做任务、签到、领流量、领积分等。

联通手机营业厅自动完成每日任务,领流量、签到获取积分等,月底流量不发愁。 功能 沃之树领流量、浇水(12M日流量) 每日签到(1积分+翻倍4积分+第七天1G流量日包) 天天抽奖,每天三次免费机会(随机奖励) 游戏中心每日打卡(连续打卡,积分递增至最高

2k May 06, 2021
A webdriver-based script for reserving Tsinghua badminton courts.

AutoReserve A webdriver-based script for reserving badminton courts. 使用说明 下载 chromedriver 选择当前Chrome对应版本 安装 selenium pip install selenium 更改场次、金额信息dat

Payne Zhang 4 Nov 09, 2021
一个m3u8视频流下载脚本

一个Python的m3u8流视频下载脚本 介绍 m3u8流视频日益常见,目前好用的下载器也有很多,我把之前自己写的一个小脚本分享出来,供广大网友使用。写此程序的目的在于给视频下载爱好者提供一个下载样例,可直接调用,勿再重复造轮子。 使用方法 在python中直接运行程序或进行外部调用 import

Nchu 0 Oct 10, 2021
薅薅乐 - JD 测试脚本

薅薅乐 安裝 使用docker docker一键安装: docker run -d --name jd classmatelin/hhl:latest. 使用 进入容器: docker exec -it jd bash 获取JD_COOKIES: python get_jd_cookies.py,

ClassmateLin 575 Dec 28, 2022
A Very simple free proxy list scraper.

Scrappp A Very simple free proxy list scraper, made in python The tool scrape proxy from diffrent sites and api's. Screenshots About the script !!! RE

Joji aka Moncef 12 Oct 27, 2022
This is my CS 20 final assesment.

eeeeeSpider This is my CS 20 final assesment. How to use: Open program Run to your hearts content! There are no external dependancies that you will ha

1 Jan 17, 2022
Introduction to WebScraping Workshop - Semcomp 24 Beta

Extrair informações da internet de forma automatizada. Existem diversas maneiras de fazer isso, nesse tutorial vamos ver algumas delas, por meio de bibliotecas de python.

Luísa Moura 19 Sep 11, 2022
Scrapes all articles and their headlines from theonion.com

The Onion Article Scraper Scrapes all articles and their headlines from the satirical news website https://www.theonion.com Also see Clickhole Article

0 Nov 17, 2021
河南工业大学 完美校园 自动校外打卡

HAUT-checkin 河南工业大学自动校外打卡 由于github actions存在明显延迟,建议直接使用腾讯云函数 特点 多人打卡 使用简单,仅需账号密码以及用于微信推送的uid 自动获取上一次打卡信息用于打卡 向所有成员微信单独推送打卡状态 完美校园服务器繁忙时造成打卡失败会自动重新打卡

36 Oct 27, 2022
This tool crawls a list of websites and download all PDF and office documents

This tool crawls a list of websites and download all PDF and office documents. Then it analyses the PDF documents and tries to detect accessibility issues.

AccessibilityLU 7 Sep 30, 2022
A python module to parse the Open Graph Protocol

OpenGraph is a module of python for parsing the Open Graph Protocol, you can read more about the specification at http://ogp.me/ Installation $ pip in

Erik Rivera 213 Nov 12, 2022
Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

Repositório contendo scripts Python que realizam a consulta de CPF e CNPJ diretamente no site da Receita Federal.

Josué Campos 5 Nov 29, 2021
Auto Join: A GitHub action script to automatically invite everyone to the organization who star your repository.

Auto Invite To The Organization By Star A GitHub Action script to automatically invite everyone to your organization that stars your repository. What

Max Base 11 Dec 11, 2022
Web-scraping - Program that scrapes a website for a collection of quotes, picks one at random and displays it

web-scraping Program that scrapes a website for a collection of quotes, picks on

Manvir Mann 1 Jan 07, 2022
Pseudo API for Google Trends

pytrends Introduction Unofficial API for Google Trends Allows simple interface for automating downloading of reports from Google Trends. Only good unt

General Mills 2.6k Dec 28, 2022
A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

🕳️ CygnusX1 Code by Trong-Dat Ngo. Overviews 🕳️ CygnusX1 is a multithreaded tool 🛠️ , used to search and download images from popular search engine

DatNgo 32 Dec 31, 2022
原神爬虫 抓取原神界面圣遗物信息

原神圣遗物半自动爬虫 说明 直接抓取原神界面中的圣遗物数据 目前只适配了背包页面的抓取 准确率:97.5%(普通通用接口,对 40 件随机圣遗物识别,统计完全正确的数量为 39) 准确率:100%(4k 屏幕,普通通用接口,对 110 件圣遗物识别,统计完全正确的数量为 110) 不排除还有小错误的

hwa 28 Oct 10, 2022
A module for CME that spiders hashes across the domain with a given hash.

hash_spider A module for CME that spiders hashes across the domain with a given hash. Installation Simply copy hash_spider.py to your CME module folde

37 Sep 08, 2022
自动完成每日体温上报(Github Actions)

体温上报助手 简介 每天 10:30 GMT+8 自动完成体温上报,如想修改定时运行的时间,可修改 .github/workflows/SduHealthReport.yml 中 schedule 属性。 如果当日有异常,请手动在小程序端/PC 端填写!

Teng Zhang 23 Sep 15, 2022
Here I provide the source code for doing web scraping using the python library, it is Selenium.

Here I provide the source code for doing web scraping using the python library, it is Selenium.

M Khaidar 1 Nov 13, 2021