Web-scraping - A bot using Python with BeautifulSoup that scraps IRS website by form number and returns the results as json

Overview

Extract Data from the IRS website A bot using Python with BeautifulSoup that scraps IRS website (prior form publication) by form number and returns the results as json. It provides the option to download pdfs over a range of years.

How to run the script? This script runs on Python 3.8. Install the libraries on requirements.txt into a new environment, then run 'Script.py'.

What should I expect? The script will ask you for the form number(s) then scrap the IRS website. --> Please enter the complete tax form number separated by a comma followed by a space (not case sensitive): (ie. Form W-2, Form 1095-C, Form W-3, etc) --> Form W-2, Form 1095-C

Then the bot will ask if the user would like to download the forms. --> Would you like to download all related pdfs? (Y/N)

If selected, the bot will follow up by asking a year range. --> Please provide the year range by using a dash in between the years (starting year must be smaller than ending year): (ie. 2018-2020)

Once executed, the bot will automatically create a folder and download the relevant pdfs into the folder.

Finally, the results will be returned as a json string. If there are no results, the user will get a 'No results' instead.

Sample output: [ {'form_number': 'Form W-2', 'form_title': 'Wage and Tax Statement (Info Copy Only)', 'min_year': '1954', 'max_year': '2022'}, {'form_number': 'Form 1095-C', 'form_title': 'Employer-Provided Health Insurance Offer and Coverage', 'min_year': '2014', 'max_year': '2022'}, {'form_number': 'Form W-3', 'form_title': 'Transmittal of Wage and Tax Statements (Info Copy Only)', 'min_year': '1990', 'max_year': '2022'} ]

Note: To keep users engaged, the bot will display which task it is performing and what URL it is currently searching.

Owner
In progress: CPA in training to become SWE
Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.

Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.

Joseph Lai 543 Jan 03, 2023
This is a script that scrapes the longitude and latitude on food.grab.com

grab This is a script that scrapes the longitude and latitude for any restaurant in Manila on food.grab.com, location can be adjusted. Search Result p

0 Nov 22, 2021
A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

Alex Papadopoulos 1 Nov 13, 2021
Command line program to download documents from web portals.

command line document download made easy Highlights list available documents in json format or download them filter documents using string matching re

16 Dec 26, 2022
CreamySoup - a helper script for automated SourceMod plugin updates management.

CreamySoup/"Creamy SourceMod Updater" (or just soup for short), a helper script for automated SourceMod plugin updates management.

3 Jan 03, 2022
京东茅台抢购

截止 2021/2/1 日,该项目已无法使用! 京东:约满即止,仅限京东实名认证用户APP端抢购,2月1日10:00开始预约,2月1日12:00开始抢购(京东APP需升级至8.5.6版本及以上) 写在前面 本项目来自 huanghyw - jd_seckill,作者的项目地址我找不到了,找到了再贴上

abee 73 Dec 03, 2022
12306抢票脚本

12306抢票脚本

罐子里的茶 457 Jan 05, 2023
A simple code to fetch comments below an Instagram post and save them to a csv file

fetch_comments A simple code to fetch comments below an Instagram post and save them to a csv file usage First you have to enter your username and pas

2 Jul 14, 2022
A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

A leetcode scraper to compile all questions in leetcode free tier to text file, pdf also available. if new questions get added, run again to get new questions.

3 Dec 07, 2021
UsernameScraperTool - Username Scraper Tool With Python

UsernameScraperTool Username Scraper for 40+ Social sites. How To use git clone

E4crypt3d 1 Dec 20, 2022
Transistor, a Python web scraping framework for intelligent use cases.

Web data collection and storage for intelligent use cases. transistor About The web is full of data. Transistor is a web scraping framework for collec

BOM Quote Manufacturing 212 Nov 05, 2022
CRI Scrape is a tool for get general info about Italian Red Cross in GAIA Platform

CRI Scrape CRI Scrape is a tool for get general info about Italian Red Cross in GAIA Platform Disclaimer This code is only for educational purpose. So

Vincenzo Cardone 0 Jul 23, 2022
A simple python script to fetch the latest covid info

covid-tracker-script A simple python script to fetch the latest covid info How it works First, get the current date in MM-DD-YYYY format. Check if the

Dot 0 Dec 15, 2021
Create crawler get some new products with maximum discount in banimode website

crawler-banimode create crawler and get some new products with maximum discount in banimode website. این پروژه کوچک جهت یادگیری و کار با ابزار سلنیوم

nourollah rezaei 2 Feb 17, 2022
An application that on a given url, crowls a web page and gets all words, sorts and counts them.

Web-Scrapping-1 An application that on a given url, crowls a web page and gets all words, sorts and counts them. Installation Using the package manage

adriano atambo 1 Jan 16, 2022
Grab the changelog from releases on Github

release-notes-scraper This simple script can be used to grab the release notes for projects from github that do not keep a CHANGELOG, but publish thei

Dan Čermák 4 Apr 01, 2022
a small library for extracting rich content from urls

A small library for extracting rich content from urls. what does it do? micawber supplies a few methods for retrieving rich metadata about a variety o

Charles Leifer 588 Dec 27, 2022
京东抢茅台,秒杀成功很多次讨论,天猫抢购,赚钱交流等。

Jd_Seckill 特别声明: 请添加个人微信:19972009719 进群交流讨论 目前群里很多人抢到【扫描微信添加群就好,满200关闭群,有喜欢薅信用卡羊毛的也可以找我交流】 本仓库发布的jd_seckill项目中涉及的任何脚本,仅用于测试和学习研究,禁止用于商业用途,不能保证其合法性,准确性

50 Jan 05, 2023
simple http & https proxy scraper and checker

simple http & https proxy scraper and checker

Neospace 11 Nov 15, 2021
script to scrape direct download links (ddls) from google drive index.

bhadoo Google Personal/Shared Drive Index scraper. A small script to scrape direct download links (ddls) of downloadable files from bhadoo google driv

sαɴᴊɪᴛ sɪɴʜα 53 Dec 16, 2022