Atom, RSS and JSON feed parser for Python 3

Overview

Atoma

https://travis-ci.org/NicolasLM/atoma.svg?branch=master https://coveralls.io/repos/github/NicolasLM/atoma/badge.svg?branch=master

Atom, RSS and JSON feed parser for Python 3.

Quickstart

Install Atoma with pip:

pip install atoma

Load and parse an Atom XML file:

>>> import atoma
>>> feed = atoma.parse_rss_file('rss-feed.xml')
>>> feed.description
'The blog relating the daily life of web agency developers'
>>> len(feed.items)
5

Parsing feeds from the Internet is easy as well:

>>> import atoma, requests
>>> response = requests.get('http://lucumr.pocoo.org/feed.atom')
>>> feed = atoma.parse_atom_bytes(response.content)
>>> feed.title.value
"Armin Ronacher's Thoughts and Writings"

Features

Security warning

If you use this library to display content from feeds in a web page, you NEED to clean the HTML contained in the feeds to prevent Cross-site scripting (XSS). The bleach library is recommended for cleaning feeds.

Useful Resources

To use this library a basic understanding of feeds is required. For Atom, the Introduction to Atom is a must read. The RFC 4287 can help lift some ambiguities. Finally the feed validator is great to test hand-crafted feeds.

For RSS, the RSS specification and rssboard.org have a ton of information and examples.

For OPML, the OPML specification has a paragraph dedicated to its usage for syndication

Non-implemented Features

Some seldom used features are not implemented:

  • XML signature and encryption
  • Some Atom and RSS extensions
  • Atom content other than text, html and xhtml

License

MIT

Comments
  • Atom reader issue

    Atom reader issue

    Hi there, I'm having an issue using the atom bytes parse tool. Unfortunately the following code yields an error;

    import requests
    import atoma
    
    response = requests.get('https://api.icis.com/v1/entities/ref-data/currency', auth=('[email protected]', 'password'))
    
    a = atoma.parse_atom_bytes(response.content)
    

    This yields the following error;

    ValueError: 'application/vnd.icis.iddn.entity+xml' is not a valid AtomTextType

    If you're able to provide any assistance that would be brilliant!

    The shortened redacted file output (including the section that is causing the issues of the above is as follows;

    <atom:title>http://iddn.icis.com/ref-data/currency/0</atom:title>
            <atom:id>http://iddn.icis.com/ref-data/currency/0</atom:id>
            <atom:updated>2018-05-20T08:12:00.701348Z</atom:updated>
            <atom:relevance-score>70656</atom:relevance-score>
            <atom:content type="application/vnd.icis.iddn.entity+xml">
    
    opened by tm553 4
  • Project status inquiry

    Project status inquiry

    Is the project dead or is it still alive? I filed an issue over a month ago, and it received no attention. I just want to know if it's worth using this project.

    opened by impredicative 4
  • Add support for Python 3.10 and 3.11, drop EOL 3.6

    Add support for Python 3.10 and 3.11, drop EOL 3.6

    Includes and replaces https://github.com/NicolasLM/atoma/pull/15.

    Travis CI has stopped running (see #15), they have a new pricing model which places limits on open source.

    • https://blog.travis-ci.com/2020-11-02-travis-ci-new-billing
    • https://www.jeffgeerling.com/blog/2020/travis-cis-new-pricing-plan-threw-wrench-my-open-source-works

    Many projects are moving to GitHub Actions instead, let's do the same.

    Other benefits of GitHub Actions:

    • Test on macOS and Windows in addition to Ubuntu
    • 20 parallel jobs compared to just 5 with Travis

    This also move coverage to Codecov, which I find works better with GitHub Actions.

    Sample build

    https://github.com/hugovk/atoma/actions/runs/3335803236

    opened by hugovk 3
  • JSON Feed version 1.0 author field cause AttributeError

    JSON Feed version 1.0 author field cause AttributeError

    feed url: https://hnrss.org/newest.jsonfeed feed content: newest.txt

      File "/Users/kk/.pyenv/versions/rssant377/lib/python3.7/site-packages/atoma/json_feed.py", line 201, in parse_json_feed
        items=_get_items(root)
      File "/Users/kk/.pyenv/versions/rssant377/lib/python3.7/site-packages/atoma/json_feed.py", line 74, in _get_items
        rv.append(_get_item(item))
      File "/Users/kk/.pyenv/versions/rssant377/lib/python3.7/site-packages/atoma/json_feed.py", line 92, in _get_item
        author=_get_author(item_dict),
      File "/Users/kk/.pyenv/versions/rssant377/lib/python3.7/site-packages/atoma/json_feed.py", line 137, in _get_author
        name=_get_text(author_dict, 'name'),
      File "/Users/kk/.pyenv/versions/rssant377/lib/python3.7/site-packages/atoma/json_feed.py", line 173, in _get_text
        rv = root.get(name)
    AttributeError: 'str' object has no attribute 'get'
    

    Expect to compatible with 1.0 author field, although it's deprecated.

    https://jsonfeed.org/version/1.1

    Deprecated items remain valid forever, but you should move to the new fields when you can. A feed using fields from JSON Feed 1.0 is still a valid feed for version 1.1 and future versions of JSON Feed.

    And thank you for the handy library! I'm also glad to create PR to fix it if you don't have time, just tell me go ahead.

    opened by guyskk 3
  • Atom feed reader wrong error message

    Atom feed reader wrong error message

    Try to parse a RSS feed with the ATOM reader leads to a dubious message:

    Python 3.6.5 (default, Apr  1 2018, 05:46:30) 
    Type 'copyright', 'credits' or 'license' for more information
    IPython 6.5.0 -- An enhanced Interactive Python. Type '?' for help.
    
    In [1]: import atoma
    
    In [2]: import requests
    
    In [3]: url = 'https://hyperdev.fr/feed.xml'
    
    In [4]: response = requests.get(url)
    
    In [5]: atoma.parse_atom_bytes(response.content)
    ---------------------------------------------------------------------------
    FeedParseError                            Traceback (most recent call last)
    <ipython-input-5-14e8e040a115> in <module>()
    ----> 1 atoma.parse_atom_bytes(response.content)
    
    ~/.local/share/virtualenvs/socialite-xGgT8vff/lib/python3.6/site-packages/atoma/atom.py in parse_atom_bytes(data)
        275     """Parse an Atom feed from a byte-string containing XML data."""
        276     root = parse_xml(BytesIO(data)).getroot()
    --> 277     return _parse_atom(root)
    
    ~/.local/share/virtualenvs/socialite-xGgT8vff/lib/python3.6/site-packages/atoma/atom.py in _parse_atom(root, parse_entries)
        222 def _parse_atom(root: Element, parse_entries: bool=True) -> AtomFeed:
        223     # Mandatory
    --> 224     id_ = get_text(root, 'feed:id', optional=False)
        225 
        226     # Optional
    
    ~/.local/share/virtualenvs/socialite-xGgT8vff/lib/python3.6/site-packages/atoma/utils.py in get_text(element, name, optional)
         49 
         50 def get_text(element: Element, name, optional: bool=True) -> Optional[str]:
    ---> 51     child = get_child(element, name, optional)
         52     if child is None:
         53         return None
    
    ~/.local/share/virtualenvs/socialite-xGgT8vff/lib/python3.6/site-packages/atoma/utils.py in get_child(element, name, optional)
         39         raise FeedParseError(
         40             'Could not parse RSS channel: "{}" required in "{}"'
    ---> 41             .format(name, element.tag)
         42         )
         43 
    
    FeedParseError: Could not parse RSS channel: "feed:id" required in "rss"
    
    In [6]: atoma.__version__
    Out[6]: '0.0.9'
    

    That is misleading:

    FeedParseError: Could not parse RSS channel: "feed:id" required in "rss"

    opened by amirouche 3
  • unable to import module

    unable to import module

    Python 3.5.5 (default, Jul 24 2018, 10:23:26) [GCC 4.4.7 20120313 (Red Hat 4.4.7-23)] on linux

    import requests import atoma Traceback (most recent call last): File "", line 1, in File "/misc/home/mname/git/venv3/lib/python3.5/site-packages/atoma/init.py", line 1, in from .atom import parse_atom_file, parse_atom_bytes File "/misc/home/mname/git/venv3/lib/python3.5/site-packages/atoma/atom.py", line 23 text_type: str = attr.ib() ^ SyntaxError: invalid syntax

    opened by klaypigeon 2
  • atoma/atom.py

    atoma/atom.py", line 22

    Hi when i run my script i got this error Traceback (most recent call last): File "import_shopify.py", line 15, in <module> import atoma File "/usr/lib/python3.4/site-packages/atoma/__init__.py", line 1, in <module> from .atom import parse_atom_file, parse_atom_bytes File "/usr/lib/python3.4/site-packages/atoma/atom.py", line 22 text_type: str = attr.ib() ^ SyntaxError: invalid syntax THIS MY SCRIPT ` import atoma, requests

    def shopify(url): response = requests.get(url) feed = atoma.parse_atom_bytes(response.content) return feed

    url = "https://www.mydomain.com/all.atom" feed = shopify(url) print(feed) `

    Thanks in advance for your help

    opened by micheleberardi 2
  • atoma parse failed with fc2blog rss

    atoma parse failed with fc2blog rss

    For example:

    http://fc2information.blog.fc2.com/?xml
    
    In [9]: rss = atoma.parse_rss_bytes(r.content)
    ---------------------------------------------------------------------------
    FeedParseError                            Traceback (most recent call last)
    <ipython-input-9-6170d3c57b9f> in <module>
    ----> 1 rss = atoma.parse_rss_bytes(r.content)
    
    ~/.pyenv/versions/3.7.1/lib/python3.7/site-packages/atoma/rss.py in parse_rss_bytes(data)
        217     """Parse an RSS feed from a byte-string containing XML data."""
        218     root = parse_xml(BytesIO(data)).getroot()
    --> 219     return _parse_rss(root)
    
    ~/.pyenv/versions/3.7.1/lib/python3.7/site-packages/atoma/rss.py in _parse_rss(root)
        165     if rss_version != '2.0':
        166         raise FeedParseError('Cannot process RSS feed version "{}"'
    --> 167                              .format(rss_version))
        168
        169     root = root.find('channel')
    
    opened by ipfans 2
  • Add support for Python 3.10, drop EOL 3.6

    Add support for Python 3.10, drop EOL 3.6

    Python 3.10 was released on 2021-10-04:

    • https://discuss.python.org/t/python-3-10-0-is-now-available/10955

    Python 3.6 is EOL and no longer receiving security updates (or any updates) from the core Python team.

    | cycle | latest | release | eol | |:------|:-------|:----------:|:----------:| | 3.10 | 3.10.1 | 2021-10-04 | 2026-10-04 | | 3.9 | 3.9.9 | 2020-10-05 | 2025-10-05 | | 3.8 | 3.8.12 | 2019-10-14 | 2024-10-14 | | 3.7 | 3.7.12 | 2018-06-27 | 2023-06-27 | | 3.6 | 3.6.15 | 2016-12-23 | 2021-12-23 |

    https://endoflife.date/python

    opened by hugovk 1
  • long_description_content_type in setup.py

    long_description_content_type in setup.py

    At this time, the readme is not properly formatted on pypi. The setup() function in setup.py may benefit from long_description_content_type='text/x-rst'. That's according to the packaging docs. After this change, please release to pypi to confirm it works.

    opened by impredicative 1
  • Empty tag for required fields

    Empty tag for required fields

    Hi, I think the validation on the source is a little too strict, for example if a description tag is present but empty in the channel section of an rss feed, atoma should not exit with an error in my opinion.

    An example https://www.giornalettismo.com/feed

    Thanks

    opened by timendum 1
  • Update code example according to current version.

    Update code example according to current version.

    Using atoma.parse_atom_feed() gave me an error, and I saw in the init.py file that the function was called atoma.parse_atom_file(). This change to the README should help others get more easily acquainted with the library.

    opened by grcarmenaty 1
  • Since recently been unable to parse Reddit's web feeds (RSS)

    Since recently been unable to parse Reddit's web feeds (RSS)

    import requests
    import atoma
    response = requests.get("https://www.reddit.com/r/<insert subreddit here>.rss")
    decoded = response.content
    parsed = atoma.parse_atom_bytes(decoded)
    

    will yield raise FeedXMLError('Not a valid XML document') It used to work flawlessly. I'll look into the details I can get when debugging and update this Issue accordingly.

    opened by why-not-try-calmer 2
  • No docs

    No docs

    It'd be really useful if there was some documentation about how to use this package. The examples in the readme are great, thanks for that, but a link to some full docs would go a long way.

    opened by corinnebosley 0
Releases(v0.0.17)
Owner
Nicolas Le Manchet
Python developer, Freelancer
Nicolas Le Manchet
JSON Schema validation library

jsonschema A JSON Schema validator implementation. It compiles schema into a validation tree to have validation as fast as possible. Supported drafts:

Dmitry Dygalo 309 Jan 01, 2023
A fast JSON parser/generator for C++ with both SAX/DOM style API

A fast JSON parser/generator for C++ with both SAX/DOM style API Tencent is pleased to support the open source community by making RapidJSON available

Tencent 12.6k Dec 30, 2022
Atom, RSS and JSON feed parser for Python 3

Atoma Atom, RSS and JSON feed parser for Python 3. Quickstart Install Atoma with pip: pip install atoma

Nicolas Le Manchet 95 Nov 28, 2022
A query expression for extracting data from JSON.

JSONPATH A selector expression for extracting data from JSON. Quickstarts Installation Install the stable version from PYPI. pip install jsonpath-extr

林玮 (Jade Lin) 33 Oct 22, 2022
Package to Encode/Decode some common file formats to json

ZnJSON Package to Encode/Decode some common file formats to json Available via pip install znjson In comparison to pickle this allows having readable

ZINC 2 Feb 02, 2022
Low code JSON to extract data in one line

JSON Inline Low code JSON to extract data in one line ENG RU Installation pip install json-inline Usage Rules Modificator Description ?key:value Searc

Aleksandr Sokolov 12 Mar 09, 2022
A Python application to transfer Zeek ASCII (not JSON) logs to Elastic/OpenSearch.

zeek2es.py This Python application translates Zeek's ASCII TSV logs into ElasticSearch's bulk load JSON format. For JSON logs, see Elastic's File Beat

Corelight, Inc. 28 Dec 22, 2022
A tools to find the path of a specific key in deep nested JSON.

如何快速从深层嵌套 JSON 中找到特定的 Key #公众号 在爬虫开发的过程中,我们经常遇到一些 Ajax 加载的接口会返回 JSON 数据。

kingname 56 Dec 13, 2022
With the help of json txt you can use your txt file as a json file in a very simple way

json txt With the help of json txt you can use your txt file as a json file in a very simple way Dependencies re filemod pip install filemod Installat

Kshitij 1 Dec 14, 2022
API that provides Wordle (ES) solutions in JSON format

Wordle (ES) solutions API that provides Wordle (ES) solutions in JSON format.

Álvaro García Jaén 2 Feb 10, 2022
Json utils is a python module that you can use when working with json files.

Json-utils Json utils is a python module that you can use when working with json files. it comes packed with a lot of featrues Features Converting jso

Advik 4 Apr 24, 2022
A fast streaming JSON parser for Python that generates SAX-like events using yajl

json-streamer jsonstreamer provides a SAX-like push parser via the JSONStreamer class and a 'object' parser via the ObjectStreamer class which emits t

Kashif Razzaqui 196 Dec 15, 2022
A Cobalt Strike Scanner that retrieves detected Team Server beacons into a JSON object

melting-cobalt 👀 A tool to hunt/mine for Cobalt Strike beacons and "reduce" their beacon configuration for later indexing. Hunts can either be expans

Splunk GitHub 150 Nov 23, 2022
Ibmi-json-beautify - Beautify json string with python

Ibmi-json-beautify - Beautify json string with python

Jefferson Vaughn 3 Feb 02, 2022
Python script to extract news from RSS feeds and save it as json.

Python script to extract news from RSS feeds and save it as json.

Alex Trbznk 14 Dec 22, 2022
Marshall python objects to and from JSON

Pymarshaler - Marshal and Unmarshal Python Objects Disclaimer This tool is in no way production ready About Pymarshaler allows you to marshal and unma

Hernan Romer 9 Dec 20, 2022
MOSP is a platform for creating, editing and sharing validated JSON objects of any type.

MONARC Objects Sharing Platform Presentation MOSP is a platform for creating, editing and sharing validated JSON objects of any type. You can use any

CASES Luxembourg 72 Dec 14, 2022
No more boilerplate to check and build a Python object from JSON.

JSONloader This module is for you if you're tired of writing boilerplate that: builds a straightforward Python object from loaded JSON. checks that yo

3 Feb 05, 2022
simdjson : Parsing gigabytes of JSON per second

JSON is everywhere on the Internet. Servers spend a *lot* of time parsing it. We need a fresh approach. The simdjson library uses commonly available SIMD instructions and microparallel algorithms to

16.3k Dec 29, 2022
Convert your JSON data to a valid Python object to allow accessing keys with the member access operator(.)

JSONObjectMapper Allows you to transform JSON data into an object whose members can be queried using the member access operator. Unlike json.dumps in

Owen Trump 4 Jul 20, 2022