A Python package that can be used to download post and comment data from Reddit.

Overview

Reddit Data Collector

Reddit Data Collector is a Python package that allows a user to collect post and comment data from Reddit. It is built on top of the Python module PRAW, which stands for "The Python Reddit API Wrapper". It aims to make it very simple for a user to collect data from Reddit for further analysis (e.g. Natural Language Processing), without having to learn the inner workings of PRAW or the Reddit API.

The main functionalities provided by the package currently include:

  1. Ability to collect a sample of post data and comment data from Reddit by simply providing the subreddit names that you wish to collect data from.

  2. Ability to convert that data into a pandas DataFrame in order to inspect it and save it for further use.

  3. Ability to seamlessly update an existing .csv file that contains some sample data collected with the package in the past, with some new sample data that is also collected with the package.

It is currently maintained by Nico Van den Hooff.

Installation

Dependencies

Reddit Data Collector requires Python and:

  • pandas (>=1.3.5)
  • praw (>=7.5.0)
  • tqdm (>=4.62.3)

User installation

The recommended way to install Reddit Data Collector is using pip:

pip install reddit-data-collector

How to Use Reddit Data Collector

Please see the examples directory for step by step instructions on how to use Reddit Data Collector.

Development

Important links

Source code

You can check the latest sources with the command:

git clone https://github.com/nicovandenhooff/reddit-data-collector.git

Contributing

To learn more about making a contribution to Reddit Data Collector, please see the contributing file.

Potential Ideas for Contribution

  • Add ability to collect images from Reddit posts that contain them.
  • Add author information to post and comment data, currently the Reddit API is inconsistent with suspended and deleted author data, so this functionality has not been built in yet.

Testing

After installation, you can launch the test suite, which is contained in the tests/tests.py. Note that you will have to have pytest >= 6.2.5 installed. You can launch the test suite by following these steps from the projects root directory:

  1. Open up tests.py with the following command:
open tests/tests.py

Comment out lines 24 to 30. Change the values in DataCollector() in line 32 to your Reddit credentials.

  1. Run the following command:
pytest tests/test.py

Project History

The project was started in January 2022 by Nico Van den Hooff as a side project while he was completing the UBC Master of Data Science Project. Nico wanted to obtain a sample of posts and comments from Reddit, but noticed that while PRAW existed and provided seamless access to Reddit's API, there was no package available that allowed for a simple method to collect this data.

Inspiration

Certain sections of this README file was inspired by the scikit-learn README.

You might also like...
Auto Join: A GitHub action script to automatically invite everyone to the organization who comment at the issue page.

Auto Invite To Org By Issue Comment A GitHub action script to automatically invite everyone to the organization who comment at the issue page. What is

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.
Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker.

Auto Liker, Auto Reaction, Auto Comment, Auto Follower Tool. RajeLiker Credit Hacker. Unlimited RajeLiker Credit Hack. Thanks To RajeLiker.

A simple Discord bot that can fetch definitions and post them in chat.
A simple Discord bot that can fetch definitions and post them in chat.

A simple Discord bot that can fetch definitions and post them in chat. If you are connected to a voice channel, the bot will also read out the definition to you.

A simple fun discord bot using discord.py that can post memes

A simple fun discord bot using discord.py * * Commands $commands - to see all commands $meme - for a random meme from the internet $cry - to make the

One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind them.

AwesomeVersion One version package to rule them all, One version package to find them, One version package to bring them all, and in the darkness bind

A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel
A simple script & container to pull COVID data from covidlive.com.au and post a summary to a slack channel

CovidLive AU Summary Slackbot This bot is a very simple slackbot that pulls data, summarises and posts up to date AU COVID stats to a provided slack c

Track live sentiment for stocks from Reddit and Twitter and identify growing stocks
Track live sentiment for stocks from Reddit and Twitter and identify growing stocks

Market Sentiment About This repository can mainly be used for two things. a. Tracking the live sentiment of stocks from Reddit and Twitter b. Tracking

A reddit.com bot that will return reference links from official python documentation site for the standard library.

Python Docs Bot A reddit.com bot that will return documentation links for the library and language reference sections of the python docs website. The

A Python bot that uses the Reddit API to send users inspiring messages.

AnonBot By Edric Antoine A Python bot that uses the Reddit API to send users inspiring messages. When a message includes 'What would Anon do?', the bo

Releases(v1.1.0)
  • v1.1.0(Mar 14, 2022)

    [1.1.0] - 2022-03-13

    Changed

    • Changed get_data in reddit_data_collector.py to return pandas DataFrame by default
    • Updated tests for the above
    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Jan 15, 2022)

    [1.0.2] - 2022-01-14

    Fixed

    • Updated _check_subreddit_exists in reddit_data_collector.py to check both names as .lower()
    • Updated tests for the above

    Changed

    • Updated README to include instructions on coverage tests
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Jan 12, 2022)

    [1.0.1] - 2022-01-12

    Fixed

    • Spelling error of separate argument in to_pandas function of reddit_data_collector.io.py, previously it was spelt like seperate

    Changed

    • Update example use and move to /examples
    • Update PyPi link in docs to working link
    • Add new potential ideas for contribution
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Jan 7, 2022)

Owner
Nico Van den Hooff
UBC Master of Data Science
Nico Van den Hooff
Info gathering | API hacketarget.com

InfoFetch Info gathering | API hackertarget.com set-up: apt-get install python3 pip3 install requests apt-get install git git clone https://github.com

Muhammed Rizad 4 Nov 22, 2021
Most Powerful Chatbot On Telegram Bot

About Hello, I am Lycia [リュキア], An Intelligent ChatBot. If You Are Feeling Lonely, You can Always Come to me and Chat With Me! How To Host The easiest

RedAura 8 May 26, 2021
Bulk convert image types with Python

Bulk Image Converter 🔥 Helper script to convert a folder's worth of images from one filetype to another, and optionally delete originals Use Setup /

1 Nov 13, 2021
A program that generates discord.py code

discord-py-generator A program that generates discord.py code Setup in cmds.txt file add your user id, client id and bot token you can change the bot

3 Dec 15, 2022
Termux Pkg

PKG Install Termux All Basic Pkg. Installation : pkg update && pkg upgrade && pkg install python && pkg install python2 && pkg install git && git clon

ɴᴏʙɪᴛᴀシ︎ 1 Oct 28, 2021
Eva Maria Bot With Python

Eva Maria Bot Features Auto Filter Manual Filter IMDB Admin Commands Broadcast Index IMDB search Inline Search Random pics ids and User info Stats, Us

Aadhi 3 Jan 06, 2022
Crypto trading bot that detects surges in the bitcoin price and executes trades.

The bot will be trading Bitcoin automatically if the price has increased by more than 3% in the last 10 minutes. We will have a stop loss of 5% and t

164 Oct 20, 2022
Quickly edit your slack posts.

Lightning Edit Quickly edit your Slack posts. Heavily inspired by @KhushrajRathod's LightningDelete. Usage: Note: Before anything, be sure to head ove

Cole Wilson 14 Nov 19, 2021
Async boto3 with Autogenerated Data Classes

awspydk Async boto3 with Autogenerated JIT Data Classes Motivation This library is forked from an internal project that works with a lot of backend AW

1 Dec 05, 2021
Is the CoWin website updated for registration?

CoWin-Update Is the CoWin website updated for registration? This is a very hacky PYTHON3 script to lookup the CoWin portal if they re-deployed their J

Yash Jakhotiya 5 May 10, 2021
PyFacebook

== PyFacebook == PyFacebook is a Python client library for the Facebook API. Samuel Cormier-Iijima ( Samuel Cormier-Iijima 573 Dec 20, 2022

A complete Python application to automatize the process of uploading files to Amazon S3

Upload files or folders (even with subfolders) to Amazon S3 in a totally automatized way taking advantage of: Amazon S3 Multipart Upload: The uploaded

Pol Alzina 1 Nov 20, 2021
Бот для скачивания треков с Deezer используя ISRC и UPC коды

deez_robot Запуск Установите необходимые библиотеки pip install -r requirements.txt Создайте файл config.py и поместите туда токен бота и ARL-токен De

Max 4 Jul 31, 2022
The official Pushy SDK for Python apps.

pushy-python The official Pushy SDK for Python apps. Pushy is the most reliable push notification gateway, perfect for real-time, mission-critical app

Pushy 1 Dec 21, 2021
This Bot Can Upload Video from Link Of Pdisk to Pdisk using its API. @PredatorHackerzZ

𝐏𝐝𝐢𝐬𝐤 𝐂𝐨𝐧𝐯𝐞𝐫𝐭𝐞𝐫 𝐁𝐨𝐭 Make short link by using 𝐏𝐝𝐢𝐬𝐤 API key Installation 𝐓𝐡𝐞 𝐄𝐚𝐬𝐲 𝐖𝐚𝐲 𝐑𝐞𝐪𝐮𝐢𝐫𝐞𝐝 𝐕𝐚𝐫𝐢𝐚𝐛𝐥𝐞

ρяє∂αтσя 25 Dec 02, 2022
𝐀 𝐦𝐨𝐝𝐮𝐥𝐚𝐫 𝐓𝐞𝐥𝐞𝐠𝐫𝐚𝐦 𝐆𝐫𝐨𝐮𝐩 𝐦𝐚𝐧𝐚𝐠𝐞𝐦𝐞𝐧𝐭 𝐛𝐨𝐭 𝐰𝐢𝐭𝐡 𝐮𝐥𝐭𝐢𝐦𝐚𝐭𝐞 𝐟𝐞𝐚𝐭𝐮𝐫𝐞𝐬 !!

𝐇𝐨𝐰 𝐓𝐨 𝐃𝐞𝐩𝐥𝐨𝐲 For easiest way to deploy this Bot click on the below button 𝐌𝐚𝐝𝐞 𝐁𝐲 𝐒𝐮𝐩𝐩𝐨𝐫𝐭 𝐆𝐫𝐨𝐮𝐩 𝐒𝐨𝐮𝐫𝐜𝐞𝐬 𝐆𝐞𝐧𝐞?

Mukesh Solanki 1 Dec 10, 2021
Twitter-bot - A Simple Twitter bot with python

twitterbot To use this bot, You will require API Key and Access Key. Signup at h

Bentil Shadrack 8 Nov 18, 2022
AminoAutoRegFxck/AutoReg For AminoApps.com

AminoAutoRegFxck AminoAutoRegFxck/AutoReg For AminoApps.com Termux apt update -y apt upgrade -y pkg install python git clone https://github.com/deluvs

3 Jan 18, 2022
Music bot for Discord

Treble Music bot for Discord Youtube is after music bots on Discord. So we are here to fill the void. Introducing Treble, the next generation of Disco

Aja Khanal 0 Sep 16, 2022
:cloud: Python API for ThePirateBay.

Unofficial Python API for ThePirateBay. Build Status Test Coverage Version Downloads (30 days) Installation $ pip install ThePirateBay Note that ThePi

Karan Goel 334 Oct 21, 2022