Console application for downloading images from Reddit in Python

Overview

RedditImageScraper

Console application for downloading images from Reddit in Python

Screenshot

Introduction

This short Python script was created for the mass-downloading of images from Reddit. It will be used later for creating data-sets for several Machine Learning projects.

In order to use the script, you will have to have a Reddit account sign-up to create a developer account. You will be assigned a client_id and client_secret which you have to enter in config.ini before you run the script.

Usage

The -r parameter provides a list of sub-reddits to search.
The -st parameter can specify the maximum number of images to download from each sub-reddit. Defaults to 1000.
The -t parameter can specify the total number of images to download across all sub-reddits before we abort. Defaults to 10000.
The -f parameter can specify the folder into which we download the images. Defaults to download.

For example, to download at-most 20 pictures from the dogpictures, dogswithjobs, GuiltyDogs, and dogs sub-reddits, aborting when we have 50 files in total, and saving the files to a folder titled dog, we would use the following:

python RedditImageScraper.py -r dogpictures dogswithjobs GuiltyDogs dog -st 20 -t 50 -f dogs

This should produce something like the following:

Downloading from dogpictures
3tdfpayjp1k51.jpg
cuz4a5np90k51.jpg
1e1g882z40k51.jpg
xX3OQgP.jpg
fuatv49mizj51.jpg
tk9khtspb1k51.jpg
pgbxxakm63k51.jpg
oeUI8Iy.jpg
vqklutghc1k51.jpg
1ctn7f0390k51.jpg
r7995a1dd3k51.jpg
qbjj06vaa0k51.jpg
q6nl05omyzj51.jpg
bl8bi5tsu2k51.jpg
gxc78spvxxj51.jpg
w0pdsr1hsyj51.jpg
9h19nq1k5vj51.jpg
y67tpittfyj51.jpg
Downloaded 18 from dogpictures

Downloading from dogswithjobs
5xxpn6xs7xj51.png
3mrgwnlum1k51.jpg
rs2uecgnb1k51.jpg
y077mg1974k51.jpg
kci6u8pc02k51.jpg
iho9wex0qrj51.jpg
109eyp6kjyj51.jpg
i86x3o6dutj51.jpg
Downloaded 8 from dogswithjobs

Downloading from GuiltyDogs
8z89s7a89dj51.jpg
c9rf2r516li51.jpg
pbdqr853rsh51.jpg
e9xihfbqdeh51.jpg
53gamygu9ch51.jpg
d3tq02dbbyg51.jpg
ifsmwutou2h51.jpg
Downloaded 7 from GuiltyDogs

Downloading from dog
1kloilrhc1k51.jpg
bwe1go65h1k51.jpg
8118vyqeg1k51.jpg
bajprhddg0k51.jpg
rlc7n4m6q0k51.jpg
z9p8llkuyyj51.jpg
dhdi10myx2k51.jpg
6zflnt9hozj51.jpg
niptrbxzf2k51.jpg
jxi3vrd901k51.jpg
u8eykob35yj51.jpg
5hwj8cce6zj51.png
9nr2t4f0vzj51.jpg
ozs8tuu7mzj51.jpg
0h1fwfqhh3k51.jpg
Downloaded 15 from dog

Downloaded 48 in total
Owner
James
Busily reinventing the wheel
James
河南工业大学 完美校园 自动校外打卡

HAUT-checkin 河南工业大学自动校外打卡 由于github actions存在明显延迟,建议直接使用腾讯云函数 特点 多人打卡 使用简单,仅需账号密码以及用于微信推送的uid 自动获取上一次打卡信息用于打卡 向所有成员微信单独推送打卡状态 完美校园服务器繁忙时造成打卡失败会自动重新打卡

36 Oct 27, 2022
Auto Join: A GitHub action script to automatically invite everyone to the organization who star your repository.

Auto Invite To The Organization By Star A GitHub Action script to automatically invite everyone to your organization that stars your repository. What

Max Base 11 Dec 11, 2022
Using Python and Pushshift.io to Track stocks on the WallStreetBets subreddit

wallstreetbets-tracker Using Python and Pushshift.io to Track stocks on the WallStreetBets subreddit.

91 Dec 08, 2022
A simple django-rest-framework api using web scraping

Apicell You can use this api to search in google, bing, pypi and subscene and get results Method : POST Parameter : query Example import request url =

Hesam N 1 Dec 19, 2021
Here I provide the source code for doing web scraping using the python library, it is Selenium.

Here I provide the source code for doing web scraping using the python library, it is Selenium.

M Khaidar 1 Nov 13, 2021
Introduction to WebScraping Workshop - Semcomp 24 Beta

Extrair informações da internet de forma automatizada. Existem diversas maneiras de fazer isso, nesse tutorial vamos ver algumas delas, por meio de bibliotecas de python.

Luísa Moura 19 Sep 11, 2022
ChromiumJniGenerator - Jni Generator module extracted from Chromium project

ChromiumJniGenerator - Jni Generator module extracted from Chromium project

allenxuan 4 Jun 12, 2022
🥫 The simple, fast, and modern web scraping library

About gazpacho is a simple, fast, and modern web scraping library. The library is stable, actively maintained, and installed with zero dependencies. I

Max Humber 692 Dec 22, 2022
Web scrapping tool written in python3, using regex, to get CVEs, Source and URLs.

searchcve Web scrapping tool written in python3, using regex, to get CVEs, Source and URLs. Generates a CSV file in the current directory. Uses the NI

32 Oct 10, 2022
Python script for crawling ResearchGate.net papers✨⭐️📎

ResearchGate Crawler Python script for crawling ResearchGate.net papers About the script This code start crawling process by urls in start.txt and giv

Mohammad Sadegh Salimi 4 Aug 30, 2022
jd_maotai rpa 基于selenium驱动的jd抢购rpa机器人

jd_maotai rpa 基于selenium驱动的jd抢购rpa机器人, 照顾我们这样的马大哈, 不会忘记抢购了, 祝大家过年都能喝上茅台. 特别声明: 本仓库发布的jd_maotai_rpa项目定义为自动化rpa项目, 是用于防止忘记参与jd茅台的活动(由于本人时常忘记), 而不是为了秒杀和抢

35 Nov 18, 2022
Incredibly fast crawler designed for OSINT.

Photon Incredibly fast crawler designed for OSINT. Photon Wiki • How To Use • Compatibility • Photon Library • Contribution • Roadmap Key Features Dat

Somdev Sangwan 9.3k Jan 02, 2023
Grab the changelog from releases on Github

release-notes-scraper This simple script can be used to grab the release notes for projects from github that do not keep a CHANGELOG, but publish thei

Dan Čermák 4 Apr 01, 2022
An arxiv spider

An Arxiv Spider 做为一个cser,杰出男孩深知内核对连接到计算机上的硬件设备进行管理的高效方式是中断而不是轮询。每当小伙伴发来一篇刚挂在arxiv上的”热乎“好文章时,杰出男孩都会感叹道:”师兄这是每天都挂在arxiv上呀,跑的好快~“。于是杰出男孩找了找 github,借鉴了一下其

Jie Liu 11 Sep 09, 2022
Unja is a fast & light tool for fetching known URLs from Wayback Machine

Unja Fetch Known Urls What's Unja? Unja is a fast & light tool for fetching known URLs from Wayback Machine, Common Crawl, Virus Total & AlienVault's

Sheryar 10 Aug 07, 2022
Free-Game-Scraper is a useful script that allows you to track down free games and DLCs on many platforms.

Game Scraper Free-Game-Scraper is a useful script that allows you to track down free games and DLCs on many platforms. Join the discord About The Proj

KursK 2 Mar 28, 2022
A simplistic scraper made to download tons of random screenshots made by people.

printStealer 1.1 What is this tool? This tool is developed to show the insecurity of the screenshot utility called prnt sc. It is a site that stores s

appelsiensam 4 Jul 26, 2022
京东茅台抢购最新优化版本,京东茅台秒杀,优化了茅台抢购进程队列

京东茅台抢购最新优化版本,京东茅台秒杀,优化了茅台抢购进程队列

MaoTai 129 Dec 14, 2022
Scrapping Connections' info on Linkedin

Scrapping Connections' info on Linkedin

MohammadReza Ardestani 1 Feb 11, 2022
Complete pipeline for crawling online newspaper article.

Complete pipeline for crawling online newspaper article. The articles are stored to MongoDB. The whole pipeline is dockerized, thus the user does not need to worry about dependencies. Additionally, d

newspipe 4 May 27, 2022