A fast streaming JSON parser for Python that generates SAX-like events using yajl

Related tags

JSONjson-streamer
Overview

json-streamer Build Status

jsonstreamer provides a SAX-like push parser via the JSONStreamer class and a 'object' parser via the ObjectStreamer class which emits top level entities in any JSON object. Based on the fast c libary 'yajl'. Great for parsing streaming json over a network as it comes in or json objects that are too large to hold in memory altogether.

Dependencies

git clone [email protected]:lloyd/yajl.git
cd yajl
./configure && make install

Setup

pip3 install jsonstreamer

Also available at PyPi - https://pypi.python.org/pypi/jsonstreamer

Example

Shell

python -m jsonstreamer.jsonstreamer < some_file.json

Code

variables which contain the input we want to parse

json_object = """
    {
        "fruits":["apple","banana", "cherry"],
        "calories":[100,200,50]
    }
"""
json_array = """[1,2,true,[4,5],"a"]"""

a catch-all event listener function which prints the events

def _catch_all(event_name, *args):
    print('\t{} : {}'.format(event_name, args))

JSONStreamer Example

Event listeners get events in their parameters and must have appropriate signatures for receiving their specific event of interest.

JSONStreamer provides the following events:

  • doc_start
  • doc_end
  • object_start
  • object_end
  • array_start
  • array_end
  • key - this also carries the name of the key as a string param
  • value - this also carries the value as a string|int|float|boolean|None param
  • element - this also carries the value as a string|int|float|boolean|None param

Listener methods must have signatures that match

For example for events: doc_start, doc_end, object_start, object_end, array_start and array_end the listener must be as such, note no params required

def listener():
    pass

OR, if your listener is a class method, it can have an additional 'self' param as such

def listener(self):
    pass

For events: key, value, element listeners must also receive an additional payload and must be declared as such

def key_listener(key_string):
    pass

import and run jsonstreamer on 'json_object'

from jsonstreamer import JSONStreamer 

print("\nParsing the json object:")
streamer = JSONStreamer() 
streamer.add_catch_all_listener(_catch_all)
streamer.consume(json_object[0:10]) #note that partial input is possible
streamer.consume(json_object[10:])
streamer.close()

output

Parsing the json object:
    doc_start : ()
    object_start : ()
    key : ('fruits',)
    array_start : ()
    element : ('apple',)
    element : ('banana',)
    element : ('cherry',)
    array_end : ()
    key : ('calories',)
    array_start : ()
    element : (100,)
    element : (200,)
    element : (50,)
    array_end : ()
    object_end : ()
    doc_end : ()

run jsonstreamer on 'json_array'

print("\nParsing the json array:")
streamer = JSONStreamer() #can't reuse old object, make a fresh one
streamer.add_catch_all_listener(_catch_all)
streamer.consume(json_array[0:5])
streamer.consume(json_array[5:])
streamer.close()

output

Parsing the json array:
    doc_start : ()
    array_start : ()
    element : (1,)
    element : (2,)
    element : (True,)
    array_start : ()
    element : (4,)
    element : (5,)
    array_end : ()
    element : ('a',)
    array_end : ()
    doc_end : ()

ObjectStreamer Example

ObjectStreamer provides the following events:

  • object_stream_start
  • object_stream_end
  • array_stream_start
  • array_stream_end
  • pair
  • element

import and run ObjectStreamer on 'json_object'

from jsonstreamer import ObjectStreamer

print("\nParsing the json object:")
object_streamer = ObjectStreamer()
object_streamer.add_catch_all_listener(_catch_all)
object_streamer.consume(json_object[0:9])
object_streamer.consume(json_object[9:])
object_streamer.close()

output

Parsing the json object:
    object_stream_start : ()
    pair : (('fruits', ['apple', 'banana', 'cherry']),)
    pair : (('calories', [100, 200, 50]),)
    object_stream_end : ()

run the ObjectStreamer on the 'json_array'

print("\nParsing the json array:")
object_streamer = ObjectStreamer()
object_streamer.add_catch_all_listener(_catch_all)
object_streamer.consume(json_array[0:4])
object_streamer.consume(json_array[4:])
object_streamer.close()

output - note that the events are different for an array

Parsing the json array:
    array_stream_start : ()
    element : (1,)
    element : (2,)
    element : (True,)
    element : ([4, 5],)
    element : ('a',)
    array_stream_end : ()

Example on attaching listeners for various events

ob_streamer = ObjectStreamer()

def pair_listener(pair):
    print('Explicit listener: Key: {} - Value: {}'.format(pair[0],pair[1]))
    
ob_streamer.add_listener('pair', pair_listener) #same for JSONStreamer
ob_streamer.consume(json_object)

ob_streamer.remove_listener(pair_listener) #if you need to remove the listener explicitly

Even easier way of attaching listeners

class MyClass:
    
    def __init__(self):
        self._obj_streamer = ObjectStreamer() #same for JSONStreamer
        
        # this automatically finds listeners in this class and attaches them if they are named
        # using the following convention '_on_eventname'. Note method names in this class
        self._obj_streamer.auto_listen(self) 
    
    def _on_object_stream_start(self):
        print ('Root Object Started')
        
    def _on_pair(self, pair):
        print('Key: {} - Value: {}'.format(pair[0],pair[1]))
        
    def parse(self, data):
        self._obj_streamer.consume(data)
        
        
m = MyClass()
m.parse(json_object)

Troubleshooting

  • If you get an OSError('Yajl cannot be found.') Please ensure that libyajl is available in the relevant directory. For example, on mac(osx) /usr/local/lib should have a "libyajl.dylib" Linux -> libyajl.so Windows -> yajl.dll
Comments
  • Trouble using 'jsonstreamer` with 'yajl-2' on Ubuntu 14.04

    Trouble using 'jsonstreamer` with 'yajl-2' on Ubuntu 14.04

    Hey @kashifrazzaqui

    I have been trying to use your library json-streamer for implementing a Streaming API..

    As directed, I have installed yajl on my Ubuntu 14.04 system and also verified it's presence and correct installation (refer: [1] & [2])

    Still, on running the command python3 -m jsonstreamer.jsonstreamer < test.json i.e. using it with jsonstreamer gives me the following :

      File "/usr/local/lib/python3.4/dist-packages/jsonstreamer/yajl/parse.py", line 29, in load_lib
        raise OSError('Yajl cannot be found.')
    OSError: Yajl cannot be found.
    

    Following up in https://github.com/lloyd/yajl/issues/190 it seems that there might be an issue in the parse.py file itself ? Maybe it's looking for yajl1 and not yajl2.

    Any pointers on this one ? Help appreciated.


    [1] Running gcc -lyajl yields:

    [email protected]:~$ gcc -lyajl
    ....
    /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crt1.o: In function `_start':
    (.text+0x20): undefined reference to `main'
    collect2: error: ld returned 1 exit status
    

    [2] And sudo ldconfig -p | grep yajl results in:

    [email protected]:~$ sudo ldconfig -p | grep yajl
        libyajl.so.2 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libyajl.so.2
    
    opened by jigyasa-grover 10
  • Ensure exception __str__ methods return strings

    Ensure exception __str__ methods return strings

    Hi there,

    Issues that throw JSONStreamerException classes are difficult to debug because there is no expectation that a str will be returned. This makes debugging a PITA.

    awesome_module.py", line 51, in map_step
        url + '\n' + str(e))
    TypeError: __str__ returned non-string (type bytes)
    
    opened by mach-kernel 3
  • Missing tests & tags

    Missing tests & tags

    PyPI has 1.3.6 , and no tests.

    GitHub only has a tag for v1.0.0 , so I cant use that.

    Could you tag v1.3.6 in GitHub, so I can use it to get tests, and finish https://build.opensuse.org/package/show/home:jayvdb:py-new/python-jsonstreamer after https://github.com/kashifrazzaqui/again/issues/8 is also fixed.

    opened by jayvdb 2
  • SyntaxError: invalid syntax

    SyntaxError: invalid syntax

    Traceback (most recent call last): File "test_jsonstreamer.py", line 3, in from jsonstreamer import JSONStreamer File "/usr/local/lib/python2.7/dist-packages/jsonstreamer/init.py", line 9, in from jsonstreamer.jsonstreamer import JSONStreamer, ObjectStreamer File "/usr/local/lib/python2.7/dist-packages/jsonstreamer/jsonstreamer.py", line 12, in from again import events File "/usr/local/lib/python2.7/dist-packages/again/init.py", line 4, in from .events import EventSource, AsyncEventSource File "/usr/local/lib/python2.7/dist-packages/again/events.py", line 49 yield from each(*args, **kwargs) ^ SyntaxError: invalid syntax python --version Python 2.7.3

    opened by tuhaolam 2
  • Want to split a 22M JSON file into smaller files to track a problem

    Want to split a 22M JSON file into smaller files to track a problem

    I have a large JSON file that has an error somewhere. I want to split the up the JSON file into smaller files that are also JSON so that I can find out where the error is. Possible with your package ?

    opened by winash12 1
  • Trouble using 'jsonstreamer` with 'yajl' on Windows 10

    Trouble using 'jsonstreamer` with 'yajl' on Windows 10

    Hey @kashifrazzaqui

    I have been trying to use your library json-streamer for implementing a Streaming API..

    As directed, I have installed yajl on my Windows 10 system and installed it as below:

    C:\Users\mianand\Downloads\lloyd-yajl-2.1.0-0-ga0ecdde\lloyd-yajl-66cb08c\build>nmake install

    Microsoft (R) Program Maintenance Utility Version 14.00.24210.0 Copyright (C) Microsoft Corporation. All rights reserved.

    [ 30%] Built target yajl_s [ 60%] Built target yajl [ 66%] Built target yajl_test [ 72%] Built target gen-extra-close [ 78%] Built target json_reformat [ 84%] Built target json_verify [ 90%] Built target parse_config [100%] Built target perftest Install the project... -- Install configuration: "Release" -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/lib/yajl.lib -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/lib/yajl.dll -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/lib/yajl_s.lib -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_parse.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_gen.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_common.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_tree.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/include/yajl/yajl_version.h -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/share/pkgconfig/yajl.pc -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/bin/json_reformat.exe -- Up-to-date: C:/Program Files (x86)/YetAnotherJSONParser/bin/json_verify.exe

    Still, on running the conda with python 3.6 gives me the following :

    from jsonstreamer import JSONStreamer Traceback (most recent call last): File "", line 1, in File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer_init_.py", line 9, in from jsonstreamer.jsonstreamer import JSONStreamer, ObjectStreamer File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer\jsonstreamer.py", line 14, in from .yajl.parse import YajlParser, YajlListener, YajlError File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer\yajl\parse.py", line 32, in yajl = load_lib() File "C:\Users\mianand\AppData\Local\Continuum\anaconda3\envs\pycharm_venv\lib\site-packages\jsonstreamer\yajl\parse.py", line 29, in load_lib raise OSError('Yajl cannot be found.') OSError: Yajl cannot be found.

    Any pointers on this one ? Help appreciated.

    opened by mitendraanand 1
  • Not looking for yajl.dll when loading Yajl

    Not looking for yajl.dll when loading Yajl

    In the method load_lib(), there is never an attempt to load Yajl from yajl.dll, which is the name of Yajl on windows. I think it would be rather easy to add this, and make this package useful on Windows as well.

    opened by Groomtar 1
  • pypi version ahead of master branch

    pypi version ahead of master branch

    Please update the PyPI entry of json-streamer https://pypi.python.org/pypi/jsonstreamer/1.3.6 and consider linking there from the short text description here.

    opened by johnyf 1
  • outdated pypi package

    outdated pypi package

    Hi,

    Could you update the pypi package? As far as I see, there were some commits since the last pypi upload. Also, I think it is a bit confusing that there is one tagged release, which is 1.0, while pypi package has 1.3.6 version number, but both of them almost a year older than some important fixes, e.g. the exponential floats. (I can install the file on my own, but I think it would be nice to update the releases.)

    opened by dvolgyes 0
Releases(v1.3.8)
Owner
Kashif Razzaqui
https://medium.com/@kashifrazzaqui
Kashif Razzaqui
An tiny CLI to load data from a JSON File during development.

JSON Server - An tiny CLI to load data from a JSON File during development.

Yuvraj.M 4 Mar 22, 2022
JSON Schema validation library

jsonschema A JSON Schema validator implementation. It compiles schema into a validation tree to have validation as fast as possible. Supported drafts:

Dmitry Dygalo 309 Jan 01, 2023
Easy JSON wrapper modfied to wrok with suggestions

🈷️ Suggester Easy JSON wrapper modfied to wrok with suggestions. This was made for small discord bots, for big bots you should not use this. 📥 Usage

RGBCube 1 Jan 22, 2022
Make JSON serialization easier

Make JSON serialization easier

4 Jun 30, 2022
A query expression for extracting data from JSON.

JSONPATH A selector expression for extracting data from JSON. Quickstarts Installation Install the stable version from PYPI. pip install jsonpath-extr

林玮 (Jade Lin) 33 Oct 22, 2022
Roamtologseq - A script loads a json export of a Roam graph and cleans it up for import into Logseq

Roam to Logseq The script loads a json export of a Roam graph and cleans it up f

Sebastian Pech 4 Mar 07, 2022
import json files directly in your python scripts

Install Install from git repository pip install git+https://github.com/zaghaghi/direct-json-import.git Use With the following json in a file named inf

Hamed Zaghaghi 51 Dec 01, 2021
The ldap2json script allows you to extract the whole LDAP content of a Windows domain into a JSON file.

ldap2json The ldap2json script allows you to extract the whole LDAP content of a Windows domain into a JSON file. Features Authenticate with password

Podalirius 68 Dec 07, 2022
A fast JSON parser/generator for C++ with both SAX/DOM style API

A fast JSON parser/generator for C++ with both SAX/DOM style API Tencent is pleased to support the open source community by making RapidJSON available

Tencent 12.6k Dec 30, 2022
A Python tool that parses JSON documents using JsonPath

A Python tool that parses JSON documents using JsonPath

8 Dec 18, 2022
Editor for json/standard python data

Editor for json/standard python data

1 Dec 07, 2021
MOSP is a platform for creating, editing and sharing validated JSON objects of any type.

MONARC Objects Sharing Platform Presentation MOSP is a platform for creating, editing and sharing validated JSON objects of any type. You can use any

CASES Luxembourg 72 Dec 14, 2022
No more boilerplate to check and build a Python object from JSON.

JSONloader This module is for you if you're tired of writing boilerplate that: builds a straightforward Python object from loaded JSON. checks that yo

3 Feb 05, 2022
Atom, RSS and JSON feed parser for Python 3

Atoma Atom, RSS and JSON feed parser for Python 3. Quickstart Install Atoma with pip: pip install atoma

Nicolas Le Manchet 95 Nov 28, 2022
Python script for converting .json to .md files using Mako templates.

Install Just install poetry and update script dependencies Usage Put your settings in settings.py and .json data (optionally, with attachments) in dat

Alexey Borontov 6 Dec 07, 2021
RedisJSON - a JSON data type for Redis

RedisJSON is a Redis module that implements ECMA-404 The JSON Data Interchange Standard as a native data type. It allows storing, updating and fetching JSON values from Redis keys (documents).

3.4k Dec 29, 2022
Random JSON Key:Pair Json Generator

Random JSON Key:Value Pair Generator This simple script take an engish dictionary of words and and makes random key value pairs. The dictionary has ap

Chris Edwards 1 Oct 14, 2021
Convert Wii UI formats to JSON5 and vice versa

Convert Wii UI formats to JSON5 and vice versa

Pablo Stebler 11 Aug 28, 2022
simdjson : Parsing gigabytes of JSON per second

JSON is everywhere on the Internet. Servers spend a *lot* of time parsing it. We need a fresh approach. The simdjson library uses commonly available SIMD instructions and microparallel algorithms to

16.3k Dec 29, 2022
A JSON utility library for Python featuring Django-style queries and mutations.

JSON Enhanced JSON Enhanced implements fast and pythonic queries and mutations for JSON objects. Installation You can install json-enhanced with pip:

Collisio Technologies 4 Aug 22, 2022