Newsscraper - A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

Last update: Jan 02, 2022

Overview

NewsScraper

A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

🔧 Installation

Clone the repo locally.
Use the package manager pip to install the requirements.

pip install -r requirements.txt

✨ Basic Usage

import NewsScraper

all_data = NewsScraper.fetch_all()
news_data = NewsScraper.fetch_news_data()
crypto_data = NewsScraper.fetch_crypto_data()

fetch_all()

Returns a set of NewsScraper.Result containing fetched results from all available RSS feeds

Can include categories: GLOBAL, US, EU, CRYPTO, BLOCKCHAIN, BTC, ETH, LTC.

fetch_news_data()

Returns a set of NewsScraper.Result containing fetched results from CNN, ABC News, Yahoo News, Fox News RSS feeds

Can include categories: GLOBAL, US, EU.

fetch_crypto_data()

Returns a set of NewsScraper.Result containing fetched results from CoinJournal, Crypto Currency News RSS feeds.

Can include categories: CRYPTO, BLOCKCHAIN, BTC, ETH, LTC.

🔨 Advanced Usage

NewsScraper.Result class

A class used to represent a returned article.

Attributes

context : str

A string describing the category of the article.

ex. "GLOBAL", "US", "BLOCKCHAIN", "BTC".
title : str

A string containing the name of the article.
summary : str

A string containing the summary of the article.

NOTE: sometimes it can have the value of "", because the RSS feed didn't provide a summary.
content : str

A string containing the content of the article.

Methods

Result.json()

Returns a dictionary with the attributes of the class formatted in JSON.

ex.

{
  "context": "global",
  "title": "title of the article",
  "summary": "summary of the article",
  "content": "content of the article"
}

News RSS Feeds

All of these functions return a set of NewsScraper.Result containing fetched results of the described RSS feeds.

fetch_abc()
fetch_cnn()
fetch_yahoo()
fetch_fox_news()

Can include categories: GLOBAL, US, EU.

Alternatively, you can use fetch_news_data() to receive results from all of them.

Crypto RSS Feeds

All of these functions return a set of NewsScraper.Result containing fetched results of the described RSS feeds.

fetch_coinjournal()
fetch_cryptocurrencynews()

Can include categories: CRYPTO, BLOCKCHAIN, BTC, ETH, LTC.

Alternatively, you can use fetch_news_data() to receive results from all of them.

🤝 Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

📝 License

This project is licensed under the MIT license.

Newsscraper - A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

Related tags

Overview

NewsScraper

🔧 Installation

✨ Basic Usage

🔨 Advanced Usage

NewsScraper.Result class

context : str

title : str

summary : str

content : str

Result.json()

News RSS Feeds

Crypto RSS Feeds

🤝 Contributing

📝 License

Owner

Rokas

Find thumbnails and original images from URL or HTML file.

Auto Join: A GitHub action script to automatically invite everyone to the organization who star your repository.

The first public repository that provides free BUBT website scraping API script on Github.

A crawler of doubamovie

Scraping followers of an instagram account

Scrapes all articles and their headlines from theonion.com

crypto currency scraping

Current Antarctic large iceberg positions derived from ASCAT and OSCAT-2

PS5 bot to find a console in france for chrismas 🎄🎅🏻 NOT FOR SCALPERS

淘宝茅台抢购最新优化版本，淘宝茅台秒杀，优化了茅台抢购线程队列

Web and PDF Scraper Refactoring

Simply scrape / download all the media from an fansly account.

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Extract embedded metadata from HTML markup

Python script for crawling ResearchGate.net papers✨⭐️📎

Web scrapper para cotizar articulos

Scraping script for stats on covid19 pandemic status in Chiba prefecture, Japan

Simple python tool for the purpose of swapping latinic letters with cirilic ones and vice versa in txt, docx and pdf files in Serbian language

An arxiv spider

This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.