Newsemble is an API that provides easy access to the current news for programmatic analysis

Overview

📰 Newsemble 📰


Logo
An API for fetching the current news.

python   Flask   MongoDB  Heroku

GitHub release Visits Badge Stars Badge Fork Badge Github all releases watchers Badge

🔖 About 🔖


Blog Post

Newsemble is an API that provides easy access to the current news for programmatic analysis. It has been built using Python, BeautifulSoup and MongoDB.
The data is scraped from these news websites every hour, stored in a database on the cloud and whenever requested, the most recent articles are promptly served.
Developers can make use of this API to fetch current data with each article having the following fields:
Headlines, Content, Source, Link and Time.



🗒️ Table of contents

💻 Technologies

Newsemble is created with:

  • Python 3
  • Flask
  • PyMongo
  • BeautifulSoup

📂 File Structure and Description

  • app.py - Flask code for the API
  • scraper.py - Collection of scrapers for the various news sites.
  • db.py - Connecting and Using MongoDB
  • utils.py - Utility Functions
  • scheduler.py - Scheduler
  • Procfile - For Deployment
  • requirements.txt - Python Requirments

🛠️ Pipeline

Newsemble pipeline

🚀 Getting-started

This project can be accessed by using following setup

Links

Links Description
www.newsemble.ml/news Link to fetch all the data from all sources
www.newsemble.ml/news/toi Link to fetch data from Times of India
www.newsemble.ml/news/th Link to fetch data from The Hindu
www.newsemble.ml/news/tie Link to fetch data from The Indian Express
www.newsemble.ml/news/ndtv Link to fetch data from NDTV news
www.newsemble.ml/news/it Link to fetch data from India Today

Request format

$ import requests
$ url = "http://www.newsemble.ml/news/"
$ requests.get(url).json()

Response format

{   
    ‘link’      :  $source_link$,
    ‘content’   :  $content_text$,    
    ‘source’    :  $news_source$,
    ‘title’     :  $headline$, 
    ‘time       :  $date_time_of_article$  
 }

Sample output

image

⚙️ Currently Supported Sites



🙏 Thanks!

All contributions are welcome and appreciated. 👍
If you liked this project, or found it useful in any way, please drop a 🌟 !

✍️ Authors ✍️

✒️ Rishabh Gupta
✒️ Vishal Singhania
✒️ Roshan Kumar

You might also like...
music downloader written in python.   (Uses jiosaavn API)
music downloader written in python. (Uses jiosaavn API)

music downloader written in python. (Uses jiosaavn API)

Mobile based API for Crunchyroll BETA (and Downloader).

Mobile based API for Crunchyroll BETA (and Downloader). Not restricted on servers and NO CLOUDFLARE

Pypixiv - A fully-typed, asynchronous api wrapper for pixiv

pypixiv this library is a fully-typed, asynchronous api wrapper for pixiv. featu

This project is helps to download contents from Streamtape by utilizing the API

It scrapes Streamtape api and download contents from the site.

A python script that discovers hidden YouTube API clients. Just a research project.

YouTube-Internal-Clients A script that discovers hidden internal clients of the YouTube (Innertube) API using bruteforce methods. The script tries cli

Simple Python script to download images and videos from public subreddits without using Reddit's API 😎
Simple Python script to download images and videos from public subreddits without using Reddit's API 😎

Subreddit Media Downloader Download images and videos from any public subreddit without using Reddit's API Made with ❤ by Nico 💬 About: This script a

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

Nepali-news-notifier This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular in

News-app - This is a news web app for reading news from different sources and topics
News-app - This is a news web app for reading news from different sources and topics

News-app - This is a news web app for reading news from different sources and topics

Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and other financial information. This repository provides an SDK for developing applications to access the NCDS.

Nasdaq Cloud Data Service (NCDS) Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and ot

The windML framework provides an easy-to-use access to wind data sources within the Python world, building upon numpy, scipy, sklearn, and matplotlib. Renewable Wind Energy, Forecasting, Prediction

windml Build status : The importance of wind in smart grids with a large number of renewable energy resources is increasing. With the growing infrastr

Repo Home WPDrawBot - (Repo, Home, WP) A powerful programmatic 2D drawing application for MacOS X which generates graphics from Python scripts. (graphics, dev, mac)

DrawBot DrawBot is a powerful, free application for macOS that invites you to write Python scripts to generate two-dimensional graphics. The built-in

A curated list of programmatic weak supervision papers and resources
A curated list of programmatic weak supervision papers and resources

A curated list of programmatic weak supervision papers and resources

Learning trajectory representations using self-supervision and programmatic supervision.
Learning trajectory representations using self-supervision and programmatic supervision.

Trajectory Embedding for Behavior Analysis (TREBA) Implementation from the paper: Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Y

Programmatic interface to Synapse services for Python

A Python client for Sage Bionetworks' Synapse, a collaborative, open-source research platform that allows teams to share data, track analyses, and collaborate

Django API that scrapes and provides the last news of the city of Carlos Casares by semantic way (RDF format).

"Casares News" API Api that scrapes and provides the last news of the city of Carlos Casares by semantic way (RDF format). Usage Consume the articles

This python module can analyse cryptocurrency news for any number of coins given and return a sentiment. Can be easily integrated with a Trading bot to keep an eye on the news.

Python script that analyses news headline or body sentiment and returns the overall media sentiment of any given coin. It can take multiple coins an

NLP project that works with news (NER, context generation, news trend analytics)
NLP project that works with news (NER, context generation, news trend analytics)

СоАвтор СоАвтор – платформа и открытый набор инструментов для редакций и журналистов-фрилансеров, который призван сделать процесс создания контента ма

Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser or phone. Install it on your server, access it anywhere and enjoy.
Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser or phone. Install it on your server, access it anywhere and enjoy.

Vigilio Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser

💰 An Alfred Workflow that provides current price of cryptocurrency
💰 An Alfred Workflow that provides current price of cryptocurrency

Coin Ticker for Alfred Workflow An Alfred Workflow that provides current price and status about cryptocurrency from cryptocompare.com. Supports Alfred

Comments
  • Bump lxml from 4.5.0 to 4.6.5

    Bump lxml from 4.5.0 to 4.6.5

    Bumps lxml from 4.5.0 to 4.6.5.

    Changelog

    Sourced from lxml's changelog.

    4.6.5 (2021-12-12)

    Bugs fixed

    • A vulnerability (GHSL-2021-1038) in the HTML cleaner allowed sneaking script content through SVG images (CVE-2021-43818).

    • A vulnerability (GHSL-2021-1037) in the HTML cleaner allowed sneaking script content through CSS imports and other crafted constructs (CVE-2021-43818).

    4.6.4 (2021-11-01)

    Features added

    • GH#317: A new property system_url was added to DTD entities. Patch by Thirdegree.

    • GH#314: The STATIC_* variables in setup.py can now be passed via env vars. Patch by Isaac Jurado.

    4.6.3 (2021-03-21)

    Bugs fixed

    • A vulnerability (CVE-2021-28957) was discovered in the HTML Cleaner by Kevin Chung, which allowed JavaScript to pass through. The cleaner now removes the HTML5 formaction attribute.

    4.6.2 (2020-11-26)

    Bugs fixed

    • A vulnerability (CVE-2020-27783) was discovered in the HTML Cleaner by Yaniv Nizry, which allowed JavaScript to pass through. The cleaner now removes more sneaky "style" content.

    4.6.1 (2020-10-18)

    ... (truncated)

    Commits
    • a9611ba Fix a test in Py2.
    • a3eacbc Prepare release of 4.6.5.
    • b7ea687 Update changelog.
    • 69a7473 Cleaner: cover some more cases where scripts could sneak through in specially...
    • 54d2985 Fix condition in test decorator.
    • 4b220b5 Use the non-depcrecated TextTestResult instead of _TextTestResult (GH-333)
    • d85c6de Exclude a test when using the macOS system libraries because it fails with li...
    • cd4bec9 Add macOS-M1 as wheel build platform.
    • fd0d471 Install automake and libtool in macOS build to be able to install the latest ...
    • f233023 Cleaner: Remove SVG image data URLs since they can embed script content.
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(v1.0)
Owner
Rishabh
CSE Undergrad at IIIT-D.
Rishabh
Download YouTube videos that are available in the given playlist

Youtube-Playlist-Downloader Download YouTube videos that are in a playlist Project assets: music downloaded music folder. (will be generated) music.db

Sultan Aljaberi 1 Dec 22, 2021
A Fast as F*** Downloader

FAFD A Fast as F*** Downloader Github Usages You'll want to use a URL like this: https://github.com/RPowell-C/FAFD/raw/main/FAFD.py It's easier DONT F

1 Jan 19, 2022
Open Source application for downloading and playing music.

Musifre Greetings For HackHeist(Wartex) Judges: Synopsis, Promotion Video & Product Functioning Video are present in Documentation Folder. A Star woul

Yash Dhingra 9 Mar 22, 2022
A toolkit to automatically crawl the paper list and download paper pdfs of ACL Ahthology.

ACL-Anthology-Crawler A toolkit to automatically crawl the paper list and download paper pdfs of ACL Anthology

Ray GG 9 Oct 09, 2022
YouTube to MP3 or 4, you get to choose...

UTubeToMP YouTube to MP3 or 4, you get to choose... If you don't wanna git clone andor dont wanna install python. Here: Repl.it Instructions: Pretty s

1 Jan 29, 2022
Google Art Image Downloader Tkinter

Google-Art-Image-Downloader-Tkinter 由 google-art-downloader 整改的批量 Google 艺术展平台高清图片下载 ⭐ It works perfectly from 2018 year till today, thanks for stars!

PY-GZKY 1 Jan 05, 2022
A user-friendly GUI for the ZSpotify music downloader.

ZSpotifyGUI A user-friendly desktop app for ZSpotify music downloader for Windows, MacOs, and Linux Discord Server - Matrix Server - Gitea Mirror - Ma

94 Dec 17, 2022
YouTube-Downloader - YouTube Video Downloader made using python

YouTube-Downloader YouTube Videos Downloder made using python.

Shivam 1 Jan 16, 2022
This package helps you to directly download an APK from Google Play by providing the package id of the app

Apk Downloader About | Features | Technologies | Requirements | Starting | License | Author 🎯 About This package helps you to directly download an AP

Daniel Agyapong 9 Dec 11, 2022
😷 Dowload dos documentos da CPI da Pandemia

A CPI da Pandemia recebeu milhares de documentos públicos, todos disponibilizados no site do Senado Federal.

Eduardo Cuducos 98 Sep 23, 2022
Simple package for Sublime Text 4; download URL's for local viewing and editing

URLDownloader This is a simple example package that allows you to easily download the contents of any web URL to edit locally. Given a URL, the packag

Terence Martin 3 Mar 05, 2022
A python program to download one or multiple videos from YouTube.

YouTube-Video-Downloader A python program to download one or multiple videos from YouTube. Quick Start guide First Clone The Project git clone https:/

Imira Randeniya 1 Sep 11, 2022
An automatic beatmapset downloader via txt file, suitable for tourney mappools.

Pooler Pooler is a bulk osu! mapset downloader, perfect for use with osu! Tournament Mappools. Prerequisites Python 3.10 Requests (pip install request

Thomas 0 Feb 11, 2022
Search & download music from a certain streaming service

Search & download music from a certain streaming service

mat 2 Mar 11, 2022
Youtube list to mp3 - Youtube list to mp3 downloader

Youtube list to mp3 downloader Tiny script to convert a list of youtube videos t

Papi Diagne 3 Feb 11, 2022
Throttle qBittorrent on Plex stream Start/Stop

Dependencies Python 3.6+ 'qbittorrent-api' Python Library Tautulli Script Setup Edit qbittorrent_throttle.py and set qBittorrent username, password an

6 Sep 24, 2022
Python youtube playlist downloader

Youtube-Playlist-Downloader-python 👍 This program is a simple Youtube playlist downloader where you input the playlist link, and then the desired pat

Pepczenko 2 Dec 25, 2021
Python software to download videos from Tiktok without rights

download-video-tiktok Python software to download videos from Tiktok without rights to install pip install requests Follow us telegram : https://t.me

muntazir halim 1 Oct 28, 2021
A python scripts that downloads doujin from nhentai without having an account

nhentai-downloader a python scripts that downloads doujin from https://nhentai.net without having an account. Usage Needs Python 3^ Linux pip3 install

Earl Sabalo 4 Jun 13, 2022
Making the process of downloading youtube videos faster and more convinient.

Easy-YT Making the process of downloading youtube videos faster and more convinient. What can it do? This python script can be used to download youtub

Meynam 39 Nov 15, 2021