A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    TarkovScrappy - A nifty little bot that lets you know if a queried item might be required for a quest at some point in the land of Tarkov!

    TarkovScrappy A nifty little bot that lets you know if a queried item might be required for a quest at some point in the land of Tarkov! Hideout items

    Joshua Smeda 2 Apr 11, 2022
    Web and PDF Scraper Refactoring

    Web and PDF Scraper Refactoring This repository contains the example code of the Web and PDF scraper code roast. Here are the links to the videos: Par

    18 Dec 31, 2022
    Web Content Retrieval for Humans™

    Lassie Lassie is a Python library for retrieving basic content from websites. Usage import lassie lassie.fetch('http://www.youtube.com/watch?v

    Mike Helmick 570 Dec 19, 2022
    让中国用户使用git从github下载的速度提高1000倍!

    序言 github上有很多好项目,但是国内用户连github却非常的慢.每次都要用插件或者其他工具来解决. 这次自己做一个小工具,输入github原地址后,就可以自动替换为代理地址,方便大家更快速的下载. 安装 pip install cit 主要功能与用法 主要功能 change 将目标地址转换为

    35 Aug 29, 2022
    Snowflake database loading utility with Scrapy integration

    Snowflake Stage Exporter Snowflake database loading utility with Scrapy integration. Meant for streaming ingestion of JSON serializable objects into S

    Oleg T. 0 Dec 06, 2021
    Footballmapies - Football mapies for learning webscraping and use of gmplot module in python

    Footballmapies - Football mapies for learning webscraping and use of gmplot module in python

    1 Jan 28, 2022
    Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key.

    Facebook Scraper Use Flask API to wrap Facebook data. Grab the wapper of Facebook public pages without an API key. (Currently working 2021) Setup Befo

    Encore Shao 2 Dec 27, 2021
    This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

    crawler_to_visual_gmane Analyzing an EMAIL Archive from gmane and vizualizing the data using the D3 JavaScript library. This is a set of tools that al

    Saim Zafar 1 Dec 20, 2021
    A simple python web scraper.

    Dissec A simple python web scraper. It gets a website and its contents and parses them with the help of bs4. Installation To install the requirements,

    11 May 06, 2022
    一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件

    QQ音乐歌词爬虫 一款利用Python来自动获取QQ音乐上某个歌手所有歌曲歌词的爬虫软件,默认去除了所有演唱会(Live)版本的歌曲。 使用方法 直接运行python run.py即可,然后输入你想获取的歌手名字,然后静静等待片刻。 output目录下保存生成的歌词和歌名文件。以周杰伦为例,会生成两

    Yang Wei 11 Jul 27, 2022
    Command line program to download documents from web portals.

    command line document download made easy Highlights list available documents in json format or download them filter documents using string matching re

    16 Dec 26, 2022
    Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

    Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)

    Guilherme Silva Uchoa 3 Oct 04, 2022
    Scraping web pages to get data

    Scraping Data Get public data and save in database This is project use Python How to run a project 1 - Clone the repository 2 - Install beautifulsoup4

    Soccer Project 2 Nov 01, 2021
    Console application for downloading images from Reddit in Python

    RedditImageScraper Console application for downloading images from Reddit in Python Introduction This short Python script was created for the mass-dow

    James 0 Jul 04, 2021
    This is my CS 20 final assesment.

    eeeeeSpider This is my CS 20 final assesment. How to use: Open program Run to your hearts content! There are no external dependancies that you will ha

    1 Jan 17, 2022
    Download images from forum threads

    Forum Image Scraper Downloads images from forum threads Only works with forums which doesn't require a login to view and have an incremental paginatio

    9 Nov 16, 2022
    Simple tool to scrape and download cross country ski timings and results from live.skidor.com

    LiveSkidorDownload Simple tool to scrape and download cross country ski timings

    0 Jan 07, 2022
    Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

    Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

    2 Nov 22, 2021
    Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages.

    Video Games Web Scraper Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages. This

    Albert Marrero 1 Jan 12, 2022
    This project was created using Python technology and flask tools to scrape a music site

    python-scrapping This project was created using Python technology and flask tools to scrape a music site You need to install the following packages to

    hosein moradi 1 Dec 07, 2021