Newsemble is an API that provides easy access to the current news for programmatic analysis

Overview

📰 Newsemble 📰


Logo
An API for fetching the current news.

python   Flask   MongoDB  Heroku

GitHub release Visits Badge Stars Badge Fork Badge Github all releases watchers Badge

🔖 About 🔖


Blog Post

Newsemble is an API that provides easy access to the current news for programmatic analysis. It has been built using Python, BeautifulSoup and MongoDB.
The data is scraped from these news websites every hour, stored in a database on the cloud and whenever requested, the most recent articles are promptly served.
Developers can make use of this API to fetch current data with each article having the following fields:
Headlines, Content, Source, Link and Time.



🗒️ Table of contents

💻 Technologies

Newsemble is created with:

  • Python 3
  • Flask
  • PyMongo
  • BeautifulSoup

📂 File Structure and Description

  • app.py - Flask code for the API
  • scraper.py - Collection of scrapers for the various news sites.
  • db.py - Connecting and Using MongoDB
  • utils.py - Utility Functions
  • scheduler.py - Scheduler
  • Procfile - For Deployment
  • requirements.txt - Python Requirments

🛠️ Pipeline

Newsemble pipeline

🚀 Getting-started

This project can be accessed by using following setup

Links

Links Description
www.newsemble.ml/news Link to fetch all the data from all sources
www.newsemble.ml/news/toi Link to fetch data from Times of India
www.newsemble.ml/news/th Link to fetch data from The Hindu
www.newsemble.ml/news/tie Link to fetch data from The Indian Express
www.newsemble.ml/news/ndtv Link to fetch data from NDTV news
www.newsemble.ml/news/it Link to fetch data from India Today

Request format

$ import requests
$ url = "http://www.newsemble.ml/news/"
$ requests.get(url).json()

Response format

{   
    ‘link’      :  $source_link$,
    ‘content’   :  $content_text$,    
    ‘source’    :  $news_source$,
    ‘title’     :  $headline$, 
    ‘time       :  $date_time_of_article$  
 }

Sample output

image

⚙️ Currently Supported Sites



🙏 Thanks!

All contributions are welcome and appreciated. 👍
If you liked this project, or found it useful in any way, please drop a 🌟 !

✍️ Authors ✍️

✒️ Rishabh Gupta
✒️ Vishal Singhania
✒️ Roshan Kumar

You might also like...
music downloader written in python.   (Uses jiosaavn API)
music downloader written in python. (Uses jiosaavn API)

music downloader written in python. (Uses jiosaavn API)

Mobile based API for Crunchyroll BETA (and Downloader).

Mobile based API for Crunchyroll BETA (and Downloader). Not restricted on servers and NO CLOUDFLARE

Pypixiv - A fully-typed, asynchronous api wrapper for pixiv

pypixiv this library is a fully-typed, asynchronous api wrapper for pixiv. featu

This project is helps to download contents from Streamtape by utilizing the API

It scrapes Streamtape api and download contents from the site.

A python script that discovers hidden YouTube API clients. Just a research project.

YouTube-Internal-Clients A script that discovers hidden internal clients of the YouTube (Innertube) API using bruteforce methods. The script tries cli

Simple Python script to download images and videos from public subreddits without using Reddit's API 😎
Simple Python script to download images and videos from public subreddits without using Reddit's API 😎

Subreddit Media Downloader Download images and videos from any public subreddit without using Reddit's API Made with ❤ by Nico 💬 About: This script a

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

Nepali-news-notifier This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular in

News-app - This is a news web app for reading news from different sources and topics
News-app - This is a news web app for reading news from different sources and topics

News-app - This is a news web app for reading news from different sources and topics

Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and other financial information. This repository provides an SDK for developing applications to access the NCDS.

Nasdaq Cloud Data Service (NCDS) Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and ot

The windML framework provides an easy-to-use access to wind data sources within the Python world, building upon numpy, scipy, sklearn, and matplotlib. Renewable Wind Energy, Forecasting, Prediction

windml Build status : The importance of wind in smart grids with a large number of renewable energy resources is increasing. With the growing infrastr

Repo Home WPDrawBot - (Repo, Home, WP) A powerful programmatic 2D drawing application for MacOS X which generates graphics from Python scripts. (graphics, dev, mac)

DrawBot DrawBot is a powerful, free application for macOS that invites you to write Python scripts to generate two-dimensional graphics. The built-in

A curated list of programmatic weak supervision papers and resources
A curated list of programmatic weak supervision papers and resources

A curated list of programmatic weak supervision papers and resources

Learning trajectory representations using self-supervision and programmatic supervision.
Learning trajectory representations using self-supervision and programmatic supervision.

Trajectory Embedding for Behavior Analysis (TREBA) Implementation from the paper: Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Y

Programmatic interface to Synapse services for Python

A Python client for Sage Bionetworks' Synapse, a collaborative, open-source research platform that allows teams to share data, track analyses, and collaborate

Django API that scrapes and provides the last news of the city of Carlos Casares by semantic way (RDF format).

"Casares News" API Api that scrapes and provides the last news of the city of Carlos Casares by semantic way (RDF format). Usage Consume the articles

This python module can analyse cryptocurrency news for any number of coins given and return a sentiment. Can be easily integrated with a Trading bot to keep an eye on the news.

Python script that analyses news headline or body sentiment and returns the overall media sentiment of any given coin. It can take multiple coins an

NLP project that works with news (NER, context generation, news trend analytics)
NLP project that works with news (NER, context generation, news trend analytics)

СоАвтор СоАвтор – платформа и открытый набор инструментов для редакций и журналистов-фрилансеров, который призван сделать процесс создания контента ма

Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser or phone. Install it on your server, access it anywhere and enjoy.
Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser or phone. Install it on your server, access it anywhere and enjoy.

Vigilio Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser

💰 An Alfred Workflow that provides current price of cryptocurrency
💰 An Alfred Workflow that provides current price of cryptocurrency

Coin Ticker for Alfred Workflow An Alfred Workflow that provides current price and status about cryptocurrency from cryptocompare.com. Supports Alfred

Comments
  • Bump lxml from 4.5.0 to 4.6.5

    Bump lxml from 4.5.0 to 4.6.5

    Bumps lxml from 4.5.0 to 4.6.5.

    Changelog

    Sourced from lxml's changelog.

    4.6.5 (2021-12-12)

    Bugs fixed

    • A vulnerability (GHSL-2021-1038) in the HTML cleaner allowed sneaking script content through SVG images (CVE-2021-43818).

    • A vulnerability (GHSL-2021-1037) in the HTML cleaner allowed sneaking script content through CSS imports and other crafted constructs (CVE-2021-43818).

    4.6.4 (2021-11-01)

    Features added

    • GH#317: A new property system_url was added to DTD entities. Patch by Thirdegree.

    • GH#314: The STATIC_* variables in setup.py can now be passed via env vars. Patch by Isaac Jurado.

    4.6.3 (2021-03-21)

    Bugs fixed

    • A vulnerability (CVE-2021-28957) was discovered in the HTML Cleaner by Kevin Chung, which allowed JavaScript to pass through. The cleaner now removes the HTML5 formaction attribute.

    4.6.2 (2020-11-26)

    Bugs fixed

    • A vulnerability (CVE-2020-27783) was discovered in the HTML Cleaner by Yaniv Nizry, which allowed JavaScript to pass through. The cleaner now removes more sneaky "style" content.

    4.6.1 (2020-10-18)

    ... (truncated)

    Commits
    • a9611ba Fix a test in Py2.
    • a3eacbc Prepare release of 4.6.5.
    • b7ea687 Update changelog.
    • 69a7473 Cleaner: cover some more cases where scripts could sneak through in specially...
    • 54d2985 Fix condition in test decorator.
    • 4b220b5 Use the non-depcrecated TextTestResult instead of _TextTestResult (GH-333)
    • d85c6de Exclude a test when using the macOS system libraries because it fails with li...
    • cd4bec9 Add macOS-M1 as wheel build platform.
    • fd0d471 Install automake and libtool in macOS build to be able to install the latest ...
    • f233023 Cleaner: Remove SVG image data URLs since they can embed script content.
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(v1.0)
Owner
Rishabh
CSE Undergrad at IIIT-D.
Rishabh
Noto fonts go universal! Download Noto fonts combined to suit your region

noto-cjk Noto CJK fonts Noto Serif CJK update was released on 25 October 2021. We moved the release history and other notes into both Sans and Serif s

Google Fonts 2k Jan 02, 2023
A Celery application to collect data, download media and extract information from social media APIs

Project IBEX A Celery application to collect data, download media and extract information from social media APIs. Requirements You must have a Redis D

ibex 4 Dec 15, 2022
Write reproducible code for getting and processing ChEMBL

chembl_downloader Don't worry about downloading/extracting ChEMBL or versioning - just use chembl_downloader to write code that knows how to download

Charles Tapley Hoyt 34 Dec 25, 2022
Python/Selenium script to scrape data about university courses

university-courses Python/Selenium script to scrape data about university courses. Script first extracts URLs of each courses homepage, then trawls ea

Sam Brown 1 Feb 02, 2022
MMDL (Mega Music Downloader) - A tool to easily download music.

mmdl - Mega Music Downloader What is mmdl ❓ TLDR: MMDL is a cli app which allows you to quickly and efficiently download one or multiple songs from Yo

techboy-coder 30 Dec 13, 2022
A web app for downloading Facebook comments as a csv file

Facebook Comment Downloader A small web app for downloading comments from a public facebook page post. Comment downloading from https://github.com/min

WSDOT 23 Jan 04, 2023
一个在新番更新后第一时间在dmhy等BT下载站自动下载的小工具.

Anime Track 一个在新番更新后第一时间自动下载的小工具. 可以根据自定义的关键字在dmhy等BT下载站在搜索结果更新时将磁力链发送至aria2实现自动下载. 基本功能包含: 将BT下载站的某个关键字的搜索结果的所有磁力链添加至ARIA2 自动更新aria2 trackers 将已添加的磁力

Sunky 24 Oct 12, 2022
A script that downloads YouTube videos/audio

YouTube-Downloader A script that downloads YouTube videos/audio from youtube. Usage Download the script by executing the following in your terminal :

Debayan Sarkar 2 Jan 04, 2022
A youtube downloader, built with flask yt-dlp

Built With Python Flask - The Python micro framework for building web applications. yt-dlp - A youtube-dl fork with additional features and fixes

Abhijith N T 13 Dec 17, 2022
A Python script to download PDB files associated with a Portable Executable (PE)

A Python script to download PDB files associated with a Portable Executable (PE)

Podalirius 33 Jan 03, 2023
Python module to download all media from a GoFile gallery.

GoFile Downloader Setup First of all, clone this repository : ~$ git clone https://github.com/quatrecentquatre-404/gofile-downloader Second, oh wait..

Quatrecentquatre 61 Jan 01, 2023
Will load an SRC page, logged in with Firefox's cookies imported, and delete all comments from every run

SRCCommentsAutoDeleter Will load an SRC page, logged in with a support browser's cookies, and delete all comments from every run Config is all done in

3 Oct 29, 2021
Google Art Image Downloader Tkinter

Google-Art-Image-Downloader-Tkinter 由 google-art-downloader 整改的批量 Google 艺术展平台高清图片下载 ⭐ It works perfectly from 2018 year till today, thanks for stars!

PY-GZKY 1 Jan 05, 2022
Archivist - Easily archive 📦 Download folder to Google Drive ☁️

Archivist Script for archiving Download folder by uploading unmodified files to a Google Drive folder. Modified files will remain in the Download fold

Timing Liu 3 Sep 30, 2022
YoutubeDownloader - Download any public Playlist from Youtube

YoutubeDownloader Download any public Youtube Channel / Playlist Features Bulk d

17 Nov 12, 2022
Python script to download all images/webms of a 4chan thread

Python3 script to continuously download all images/webms of multiple 4chan thread simultaneously - without installation

Micha Fink 208 Jan 04, 2023
Youtube list to mp3 - Youtube list to mp3 downloader

Youtube list to mp3 downloader Tiny script to convert a list of youtube videos t

Papi Diagne 3 Feb 11, 2022
YouTube to MP3 or 4, you get to choose...

UTubeToMP YouTube to MP3 or 4, you get to choose... If you don't wanna git clone andor dont wanna install python. Here: Repl.it Instructions: Pretty s

1 Jan 29, 2022
Pantheon - The fastest YouTube downloader.

A Youtube downloader written in Python3, using HTTP requests and an API.

Billy 38 Nov 21, 2022
A simple GUI video downloader built off of the python module 'yt-dlp'

Simple-Youtube-DL-Gui Supported Operating Systems Windows 7 (x64), Windows 8 (x64), and Windows 10 (x64) How to use Main Gui Extract program from arch

12 Dec 30, 2022