SpyQL - SQL with Python in the middle

Overview

SpyQL

SQL with Python in the middle

https://pypi.python.org/pypi/spyql https://travis-ci.com/dcmoura/spyql https://spyql.readthedocs.io/en/latest/?version=latest codecov code style: black license: MIT

Concept

SpyQL is a query language that combines:

  • the simplicity and structure of SQL
  • with the power and readability of Python
SELECT
    date.fromtimestamp(purchase_ts) AS purchase_date,
    price * quantity AS total
FROM csv
WHERE department.upper() == 'IT'
TO json

SQL provides the structure of the query, while Python is used to define expressions, bringing along a vast ecosystem of packages.

SpyQL command-line tool

With the SpyQL command-line tool you can make SQL-like SELECTs powered by Python on top of text data (e.g. CSV and JSON). Data can come from files but also from data streams, such as as Kafka, or from databases such as PostgreSQL. Basically, data can come from any command that outputs text :-). More, data can be generated by a Python iterator! Take a look at the examples section to see how to query parquet, process API calls, transverse directories of zipped JSONs, among many other things.

SpyQL also allows you to easily convert between text data formats:

  • FROM: CSV, JSON, TEXT and Python iterators (YES, you can use a list comprehension as the data source)

  • TO: CSV, JSON, SQL (INSERT statements), pretty terminal printing, and terminal plotting.

The JSON format is JSON lines, where each line has a valid JSON object or array. Piping with jq allows SpyQL to handle any JSON input (more on the examples section).

You can leverage command line tools to process other file types like Parquet and XML (more on the examples section).

Installation

To install SpyQL, run this command in your terminal:

pip install spyql

Hello world

To test your installation run in the terminal:

spyql "SELECT 'Hello world' as Message TO pretty"

Output:

Message
-----------
Hello world

Try replacing the output format by json and csv, and try adding more columns. e.g. run in the terminal:

spyql "SELECT 'Hello world' as message, 1+2 as three TO json"

Output:

{"message": "Hello world", "three": 3}

Principles

Right now, the focus is on building a command-line tool that follows these core principles:

  • Simple: simple to use with a straightforward implementation
  • Familiar: you should feel at home if you are acquainted with SQL and Python
  • Light: small memory footprint that allows you to process large data that fit into your machine
  • Useful: it should make your life easier, filling a gap in the eco-system

Syntax

[ IMPORT python_module [ AS identifier ] [, ...] ]
SELECT [ DISTINCT | PARTIALS ] 
    [ * | python_expression [ AS output_column_name ] [, ...] ]
    [ FROM csv | spy | text | python_expression | json [ EXPLODE path ] ]
    [ WHERE python_expression ]
    [ GROUP BY output_column_number | python_expression  [, ...] ]
    [ ORDER BY output_column_number | python_expression
        [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ]
    [ LIMIT row_count ]
    [ OFFSET num_rows_to_skip ]
    [ TO csv | json | spy | sql | pretty | plot ]

Notable differences to SQL

In SpyQL:

  • there is guarantee that the order of the output rows is the same as in the input (if no reordering is done)
  • the AS keyword must precede a column alias definition (it is not optional as in SQL)
  • you can always access the nth input column by using the default column names colN (e.g. col1 for the first column)
  • currently only a small subset of SQL is supported, namely SELECT statements without: sub-queries, joins, set operations, etc (check the Syntax section)
  • sub-queries are achieved by piping (see the [Command line examples](#command line examples) section)
  • aggregation functions have the suffix _agg to avoid conflicts with python's built-in functions:
Operation PostgreSQL SpyQL
Sum all values of a column SELECT sum(col_name) SELECT sum_agg(col_name)
Sum an array SELECT sum(a) FROM (SELECT unnest(array[1,2,3]) AS a) AS t SELECT sum([1,2,3])
  • expressions are pure Python:
SQL SpySQL
x = y x == y
x BETWEEN a AND b a <= x <= b
CAST(x AS INTEGER) int(x)
CASE WHEN x > 0 THEN 1 ELSE -1 END 1 if x > 0 else -1
upper('hello') 'hello'.upper()

Notable differences to Python

Additional syntax

We added additional syntax for making querying easier:

Python SpySQL shortcut Purpose
json['hello']['planet earth'] json->hello->'planet earth' Easy access of elements in dicts (e.g. JSONs)

NULL datatype

Python's None generates exceptions when making operations on missing data, breaking query execution (e.g. None + 1 throws a TypeError). To overcome this, we created a NULL type that has the same behavior as in SQL (e.g. NULL + 1 returns NULL), allowing for queries to continue processing data.

Operation Native Python throws SpySQL returns SpySQL warning
NULL + 1 NameError NULL
a_dict['inexistent_key'] KeyError NULL yes
int('') ValueError NULL yes
int('abc') ValueError NULL yes

The above dictionary key access only returns NULL if the dict is an instance of NullSafeDict. SpyQL adds NullSafeDict, which extends python's native dict. JSONs are automatically loaded as NullSafeDict. Unless you are creating dictionaries on the fly you do not need to worry about this.

Importing python modules and user-defined functions

By default, spyql do some commonly used imports:

  • everything from the math module
  • datetime, date and timezone from the datetime module
  • the re module

SpyQL queries support a single import statement at the beginning of the query where several modules can be imported (e.g. IMPORT numpy AS np, sys SELECT ...). Note that the python syntax from module import identifier is not supported in queries.

In addition, you can create a python file that is loaded before executing queries. Here you can define imports, functions, variables, etc using regular python code. Everything defined in this file is available to all your spyql queries. The file should be located at XDG_CONFIG_HOME/spyql/init.py. If the environment variable XDG_CONFIG_HOME is not defined, it defaults to HOME/.config (e.g. /Users/janedoe/.config/spyql/init.py).

Example queries

You can run the following example queries in the terminal: spyql "the_query" < a_data_file

Example data files are not provided on most cases.

Query a CSV (and print a pretty table)

SELECT a_col_name, 'positive' if col2 >= 0 else 'negative' AS sign
FROM csv
TO pretty

Convert CSV to a flat JSON

SELECT * FROM csv TO json

Convert from CSV to a hierarchical JSON

SELECT {'client': {'id': col1, 'name': col2}, 'price': 120.40}
FROM csv TO json

or

SELECT {'id': col1, 'name': col2} AS client, 120.40 AS price
FROM csv TO json

JSON to CSV, filtering out NULLs

SELECT json->client->id AS id, json->client->name AS name, json->price AS price
FROM json
WHERE json->client->name is not NULL
TO csv

Explode JSON to CSV

SELECT json->invoice_num AS id, json->items->name AS name, json-items->price AS price
FROM json
EXPLODE json->items
TO csv

Sample input:

{"invoice_num" : 1028, "items": [{"name": "tomatoes", "price": 1.5}, {"name": "bananas", "price": 2.0}]}
{"invoice_num" : 1029, "items": [{"name": "peaches", "price": 3.12}]}

Output:

id, name, price
1028, tomatoes, 1.5
1028, bananas, 2.0
1029, peaches, 3.12

Python iterator/list/comprehension to JSON

SELECT 10 * cos(col1 * ((pi * 4) / 90)
FROM range(80)
TO json

or

SELECT col1
FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]
TO json

Importing python modules

Here we import hashlib to calculate a md5 hash for each input line. Before running this example you need to install the hashlib package (pip install hashlib).

IMPORT hashlib as hl
SELECT hl.md5(col1.encode('utf-8')).hexdigest()
FROM text

Getting the top 5 records

SELECT int(score) AS score, player_name
FROM csv
ORDER BY 1 DESC NULLS LAST, score_date
LIMIT 5

Aggregations

Totals by player, alphabetically ordered.

SELECT json->player_name, sum_agg(json->score) AS total_score
FROM json
GROUP BY 1
ORDER BY 1

Partial aggregations

Calculating the cumulative sum of a variable using the PARTIALS modifier. Also demoing the lag aggregator.

SELECT PARTIALS 
    json->new_entries, 
    sum_agg(json->new_entries) AS cum_new_entries,
    lag(json->new_entries) AS prev_entries
FROM json
TO json

Sample input:

{"new_entries" : 10}
{"new_entries" : 5}
{"new_entries" : 25}
{"new_entries" : null}
{}
{"new_entries" : 100}

Output:

{"new_entries" : 10,   "cum_new_entries" : 10,  "prev_entries": null}
{"new_entries" : 5,    "cum_new_entries" : 15,  "prev_entries": 10}
{"new_entries" : 25,   "cum_new_entries" : 40,  "prev_entries": 5}
{"new_entries" : null, "cum_new_entries" : 40,  "prev_entries": 25}
{"new_entries" : null, "cum_new_entries" : 40,  "prev_entries": null}
{"new_entries" : 100,  "cum_new_entries" : 140, "prev_entries": null}

If PARTIALSwas omitted the result would be equivalent to the last output row.

Distinct rows

SELECT DISTINCT *
FROM csv

Command line examples

To run the following examples, type Ctrl-x Ctrl-e on you terminal. This will open your default editor (emacs/vim). Paste the code of one of the examples, save and exit.

Queries on Parquet with directories

Here, find transverses a directory and executes parquet-tools for each parquet file, dumping each file to json format. jq -c makes sure that the output has 1 json per line before handing over to spyql. This is far from being an efficient way to query parquet files, but it might be a handy option if you need to do a quick inspection.

a_field, json->a_num_field * 2 + 1 FROM json "">
find /the/directory -name "*.parquet" -exec parquet-tools cat --json {} \; |
jq -c |
spyql "
	SELECT json->a_field, json->a_num_field * 2 + 1
	FROM json
"

Querying multiple json.gz files

a_field, json->a_num_field * 2 + 1 FROM json "">
gzcat *.json.gz |
jq -c |
spyql "
	SELECT json->a_field, json->a_num_field * 2 + 1
	FROM json
"

Querying YAML / XML / TOML files

yq converts yaml, xml and toml files to json, allowing to easily query any of these with spyql.

a_field FROM json"">
cat file.yaml | yq -c | spyql "SELECT json->a_field FROM json"
a_field FROM json"">
cat file.xml | xq -c | spyql "SELECT json->a_field FROM json"
a_field FROM json"">
cat file.toml | tomlq -c | spyql "SELECT json->a_field FROM json"

Kafka to PostegreSQL pipeline

Read data from a kafka topic and write to postgres table name customer.

customer->id AS id, json->customer->name AS name FROM json TO sql " | psql -U an_user_name -h a.host.com a_database_name">
kafkacat -b the.broker.com -t the.topic |
spyql -Otable=customer -Ochunk_size=1 --unbuffered "
	SELECT
		json->customer->id AS id,
		json->customer->name AS name
	FROM json
	TO sql
" |
psql -U an_user_name -h a.host.com a_database_name

Monitoring statistics in Kafka

Read data from a kafka topic, continuously calculating statistics.

kafkacat -b the.broker.com -t the.topic |
spyql --unbuffered "
	SELECT PARTIALS
        count_agg(*) AS running_count,
		sum_agg(value) AS running_sum,
		min_agg(value) AS min_so_far, 
        value AS current_value
	FROM json
	TO csv
" 

Sub-queries (piping)

A special file format (spy) is used to efficiently pipe data between queries.

first_name, json->middle_name, json->last_name]) AS full_name FROM json TO spy" | spyql "SELECT full_name, full_name.upper() FROM spy"">
cat a_file.json |
spyql "
	SELECT ' '.join([json->first_name, json->middle_name, json->last_name]) AS full_name
	FROM json
	TO spy" |
spyql "SELECT full_name, full_name.upper() FROM spy"

Queries over APIs

data->email AS email, 'Dear {}, thank you for being a great customer!'.format(json->data->first_name) AS msg FROM json EXPLODE json->data TO json "">
curl https://reqres.in/api/users?page=2 |
spyql "
	SELECT
		json->data->email AS email,
		'Dear {}, thank you for being a great customer!'.format(json->data->first_name) AS msg
	FROM json
	EXPLODE json->data
	TO json
"

Plotting to the terminal

spyql "
    SELECT col1
    FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]
    TO plot
"

Plotting with gnuplot

To the terminal:

spyql "
    SELECT col1
    FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]
    TO csv
" |
sed 1d |
feedgnuplot --terminal 'dumb 80,30' --exit --lines

To GUI:

spyql "
    SELECT col1
    FROM [10 * cos(i * ((pi * 4) / 90)) for i in range(80)]
    TO csv
" |
sed 1d |
feedgnuplot --lines --points --exit

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Comments
  • Interactive SpyQL

    Interactive SpyQL

    This is v0 for interactive spyql. It implements the following new things:

    1. Q class and q function in interactive.py: This is the class/function that the user can add in their script
    2. InteractiveProcessor in processor.py: row yielding loaded by the Processor
    3. InteractiveWriter in processor.py: writer that adds the output in a list

    Other Important changes:

    1. init_vars, go and _go now have user_query_vars as a valid input
    2. Processor can now be explicitly defined as interactive

    Misc:

    1. ifs changed to elifs where needed

    Issue: #64

    To test this python tests/interactive_test.py.

    opened by yashbonde 12
  • Refactor `parse_structure` and allow comment in query

    Refactor `parse_structure` and allow comment in query

    The purposes of this PR is

    • Refactor parse_structure function (cf. https://github.com/dcmoura/spyql/issues/85#issue-1459257843)
    • Support for line comments (cf. https://github.com/dcmoura/spyql/issues/84)
    enhancement 
    opened by Hayashi-Yudai 9
  • Key-value aggregations and lookups

    Key-value aggregations and lookups

    This PR closes #59.

    First, it introduces dict_agg for key-value aggregations. Example:

    $ cat codes.csv
    MCC,MCC (int),MNC,MNC (int),ISO,Country,Country Code,Network
    289,649,88,2191,ge,Abkhazia,7,A-Mobile
    289,649,68,1679,ge,Abkhazia,7,A-Mobile
    289,649,67,1663,ge,Abkhazia,7,Aquafon
    412,1042,01,31,af,Afghanistan,93,AWCC
    412,1042,50,1295,af,Afghanistan,93,Etisalat
    412,1042,30,783,af,Afghanistan,93,Etisalat
    452,1106,04,79,vn,Vietnam,84,Viettel
    452,1106,02,47,vn,Vietnam,84,VinaPhone
    543,1347,299,665,wf,Wallis and Futuna,,Failed Calls
    543,1347,01,31,wf,Wallis and Futuna,,Manuia
    421,1057,999,2457,ye,Yemen,967,Fix Line
    421,1057,04,79,ye,Yemen,967,HITS/Y Unitel
    421,1057,01,31,ye,Yemen,967,Sabaphone
    421,1057,03,63,ye,Yemen,967,Yemen Mob. CDMA
    645,1605,01,31,zm,Zambia,260,Airtel
    645,1605,299,665,zm,Zambia,260,Failed Calls
    
    $ spyql "
        SELECT dict_agg(MCC, Country) AS json 
        FROM csv TO json
      " < codes.csv > code_country.json
    
    $ jq . code_country.json                                        
    {
      "289": "Abkhazia",
      "412": "Afghanistan",
      "452": "Vietnam",
      "543": "Wallis and Futuna",
      "421": "Yemen",
      "645": "Zambia"
    }
    

    Second, it introduces the ~~kv function~~ -J option to allow loading JSON objects for key-value lookups, emulating left equi joins. Example:

    $ cat towers_mini.csv 
    radio,mcc,net,area,cell,unit,lon,lat,range,samples,changeable,created,updated,averageSignal
    UMTS,262,2,776,165186,0,13.329163,52.46521,1000,2,1,1286645042,1295769656,0
    GSM,262,3,1075,5651,0,13.329163,52.46521,1000,2,1,1286645042,1295769656,0
    UMTS,262,2,801,165123,0,13.295516967773,52.494735717773,1000,1,1,1286864565,1286864565,0
    UMTS,262,2,801,165121,0,13.301697,52.499886,1000,4,1,1286884439,1297735807,0
    GSM,624,2,6,13883,0,9.704361,4.063911,1588,8,1,1352031870,1374212876,0
    GSM,624,2,6,34381,0,9.709009,4.066368,2484,13,1,1352031870,1380389330,0
    GSM,452,1,10068,5293,0,105.8031463623,20.959854125977,1000,2,1,1459692342,1488347104,0
    GSM,645,3,130,3282,0,28.070755004883,-14.902267456055,1000,1,1,1459743588,1459743588,0
    GSM,645,3,1,32221,0,28.252716,-15.439224,1000,2,1,1369353852,1369353852,0
    GSM,645,3,1,33932,0,28.255325,-15.436752,1000,5,1,1369353888,1369353888,0
    GSM,645,3,1,31662,0,28.258072,-15.434555,1000,5,1,1369353960,1369353960,0
    
    $ spyql -Jcountries=code_country.json "SELECT mcc, countries[mcc] AS country, count_agg(1) AS n_recs FROM csv GROUP BY 1, 2 ORDER BY 1 TO pretty" < towers_mini.csv
      mcc  country      n_recs
    -----  ---------  --------
      262                    4
      452  Vietnam           1
      624                    2
      645  Zambia            4
    
    opened by dcmoura 7
  • Code improvements

    Code improvements

    Some cases where the code could be more pythonic:

    • some for cycles where range(len(l)) might be replaced by enumerate
    • some functions where yeld might be used to return a generator

    Changes should be accessed for performance.

    enhancement good first issue 
    opened by dcmoura 6
  • code improvement for issue #60

    code improvement for issue #60

    Improve the code.

    TODOs

    • [x] Fix code to more pythonic style
      • Fix import style
      • Remove some unused imports
    • [x] Fix error in lint by flake8

    Issue: https://github.com/dcmoura/spyql/issues/60

    opened by Hayashi-Yudai 5
  • LIKE clause

    LIKE clause

    Adds a LIKE function to the WHERE clause for easier matching. Closes #17

    Some notes

    • Right now it can match not only with strings but also with everything else, as the values are stringified before matching. Let me know if this is not intended and if it should be for strings only.
    • It only supports the % wildcard operator.
    • Let me know if the commit messages should follow a specific format

    Future work

    Either in the scope of this PR or in a future one, it is still missing other basic matching functions, like _ for single character matching or [] to specify a range.

    enhancement 
    opened by ricardocchaves 4
  • Fixes git-actions tests on py3.6 and adds py3.11

    Fixes git-actions tests on py3.6 and adds py3.11

    Python 3.6 env stopped launching. Seems to be related to an upgrade on ubuntu version and support for py3.6 was not included in the new version.

    Solution based on https://github.com/actions/setup-python/issues/543#issuecomment-1335614916

    opened by dcmoura 3
  • [FR] COUNT(DISTINCT col1) ... GROUP BY

    [FR] COUNT(DISTINCT col1) ... GROUP BY

    Thank you for the great tool!

    I'd like to request feature which works like COUNT(DISTINCT col1) ... GROUP BY of SQL. I tried count_agg(distinct col1), but got syntax error.

    opened by Minyus 3
  • Full-featured documentation

    Full-featured documentation

    This PR proposes a brand new documentation. This PR closes #34 .

    Please ignore the commits... the history of this PR is quite disturbing ๐Ÿ˜… Everything will be squashed into a single commit when merging to master.

    Please focus on 1) the structure and then 2) the content of:

    • https://github.com/dcmoura/spyql/blob/docs/README.rst
    • https://spyql.readthedocs.io/en/docs/

    I am proposing deprecating the -> operator. It will still be supported but is not mentioned in the documentation. From now on, the . (dot) operator should be the standard way of accessing fields in dicts, followed by standard Python syntax (a_dict["key"]). Major drawback is that we will need to do a new video demoing the CLI and update some posts (e.g. stack overflow).

    There is still some work to do on the lib side. Namely, we should add a section under Examples/Recipes. However, there is still some dev work to do, namely on accessing Pandas dataframes and series, as well as numpy arrays. Eventually we can also visit polars and other libs for tabular data. When we do that, we can extend the documentation.

    In this first review let's try to focus on the big fish, and not so much on the details. My goal is to publish the documentation soon, and then we can iterate on it over time. Thanks!

    documentation 
    opened by dcmoura 3
  • Column names with TEXT format

    Column names with TEXT format

    When using TEXT as an import format, I couldn't find a way of configuring the column name. The column name seems to be evaluated as long as we don't specify * in the SELECT statement which we have to do for TEXT files. Am I missing something? Here's an example:

    cat /tmp/test.txt | spyql -Otable=domains 'Select * as colname from text To SQL'
    INSERT INTO "domains"("col1") VALUES ('aaaaaaaaaa'),('bbbbbbbbb'),('cccccccccc'),('dddddddddd'),('eeeeeeeeee');
    
    cat /tmp/test.txt | spyql -Otable=domains 'Select 1 as colname from text To SQL' 
    INSERT INTO "domains"("colname") VALUES (1),(1),(1),(1),(1);
    

    Expected result would be:

    cat /tmp/test.txt | spyql -Otable=domains 'Select * as colname from text To SQL'
    INSERT INTO "domains"("colname") VALUES ('aaaaaaaaaa'),('bbbbbbbbb'),('cccccccccc'),('dddddddddd'),('eeeeeeeeee');
    
    question 
    opened by henshin 3
  • Cannot dump JSON object with Null values to string

    Cannot dump JSON object with Null values to string

    Use Case

    I'm using the CLI to import JSON Line log files into a Postgres database. One of the columns I'm trying to import is jsonb, and needs to be inserted as a JSON string.

    Approach

    spyql 'IMPORT json SELECT json.dumps(.a) FROM {"a": {"foo": 1, "bar": 2}} TO sql'
    INSERT INTO "table_name"("dumps_a") VALUES ('{"foo": 1, "bar": 2}');
    

    This works fine until it hits a null value:

    spyql 'IMPORT json SELECT json.dumps(.a) FROM {"a": {"foo": 1, "bar": null}} TO sql'
    ERROR	could not evaluate SELECT expression #1: json.dumps(row.a)
    	at data row #1: [{'a': {'foo': 1, 'bar': NULL}}]
    TypeError: Object of type NullType is not JSON serializable
    

    I also tried passing the default kwarg to json.dumps and hit a different error:

    spyql 'IMPORT json SELECT json.dumps(.a, default=str) FROM {"a": {"foo": 1, "bar": null}} TO sql'
    ERROR	could not evaluate SELECT expression #1: json.dumps(row.a, default=str)
    	at data row #1: [{'a': {'foo': 1, 'bar': NULL}}]
    ValueError: Circular reference detected
    

    Very open to alternative approaches; thank you so much for making this great tool available. :)

    opened by mgburns 2
  • [FR] join 2 CSV files

    [FR] join 2 CSV files

    Hi @dcmoura , I see examples to join a JSON file in the document, would joining 2 CSV files be supported? https://spyql.readthedocs.io/en/latest/recipes.html?highlight=join#equi-joins

    opened by Minyus 6
  • general text file support

    general text file support

    I'm using spyql to handle general text files (single-column, not comma-separated), but got a problem.

    A single-column text file like the following is misinterpreted as multiple columns by spyql.

    test.csv:

    user_id
    a11e11
    b22e22
    
    $ spyql "SELECT * FROM csv('test.csv') TO pretty"
    us      r_id
    ----  ------
    a11       11
    b22       22
    

    If there are no common characters among rows, the following error is returned.

    ERROR   Could not detect CSV dialect from input
    _csv.Error: Could not determine delimiter
    

    I found that explicitly setting a delimiter like csv('test.csv', delimiter=',') works as a workaround in the current version, but I would like to suggest the following options.

    [Option 1] Modify default behavior of csv()

    1. As a delimiter character, search only "," (especially not alphanumeric characters)
    2. If no delimiter is detected (and each row is not too long), read it as a single-column text file rather than returning an error

    [Option 2] Add an alternative to csv() which can read a single-column text file (something like txt())

    opened by Minyus 6
  • Cannot run CI/CD pipe on Python 3.11

    Cannot run CI/CD pipe on Python 3.11

    Tried to add Python 3.11 but broke the build:

    Installing build dependencies: started
    [66](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:67)
      Installing build dependencies: finished with status 'done'
    [67](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:68)
      Getting requirements to build wheel: started
    [68](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:69)
      Getting requirements to build wheel: finished with status 'done'
    [69](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:70)
      Preparing metadata (pyproject.toml): started
    [70](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:71)
      Preparing metadata (pyproject.toml): finished with status 'error'
    [71](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:72)
      error: subprocess-exited-with-error
    [72](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:73)
      
    [73](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:74)
      ร— Preparing metadata (pyproject.toml) did not run successfully.
    [74](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:75)
      โ”‚ exit code: 1
    [75](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:76)
      โ•ฐโ”€> [11 lines of output]
    [76](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:77)
          ๐Ÿ’ฅ maturin failed
    [77](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:78)
            Caused by: Cargo metadata failed. Does your crate compile with `cargo build`?
    [78](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:79)
            Caused by: `cargo metadata` exited with an error: error: failed to run `rustc` to learn about target-specific information
    [79](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:80)
          
    [80](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:81)
          Caused by:
    [81](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:82)
            process didn't exit successfully: `rustc - --crate-name ___ --print=file-names -Z mutable-noalias=yes -C target-feature=+sse2 --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=cfg` (exit status: 1)
    [82](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:83)
            --- stderr
    [83](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:84)
            error: the option `Z` is only accepted on the nightly compiler
    [84](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:85)
          Error running maturin: Command '['maturin', 'pep517', 'write-dist-info', '--metadata-directory', '/tmp/pip-modern-metadata-2ryt4i58', '--interpreter', '/opt/hostedtoolcache/Python/3.11.0/x64/bin/python', '--manylinux=off', '--strip=on']' returned non-zero exit status 1.
    [85](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:86)
          Checking for Rust toolchain....
    [86](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:87)
          Running `maturin pep517 write-dist-info --metadata-directory /tmp/pip-modern-metadata-2ryt4i58 --interpreter /opt/hostedtoolcache/Python/3.11.0/x64/bin/python --manylinux=off --strip=on`
    [87](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:88)
          [end of output]
    [88](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:89)
      
    [89](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:90)
      note: This error originates from a subprocess, and is likely not a problem with pip.
    [90](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:91)
    error: metadata-generation-failed
    [91](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:92)
    
    [92](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:93)
    ร— Encountered error while generating package metadata.
    [93](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:94)
    โ•ฐโ”€> See above for output.
    [94](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:95)
    
    [95](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:96)
    note: This is an issue with the package mentioned above, not pip.
    [96](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:97)
    hint: See above for details.
    [97](https://github.com/dcmoura/spyql/actions/runs/3609720398/jobs/6083085988#step:4:98)
    Error: Process completed with exit code 1.
    
    bug good first issue 
    opened by dcmoura 0
  • Support reading and writing of JSON objects (not JSON lines)

    Support reading and writing of JSON objects (not JSON lines)

    Currently SPyQL only allows to read and write JSON lines. Writing JSON arrays can be done using the dict_agg, aggregating everything into an array and writing an JSON with a single line.

    The idea is to add an argument lines=True to json and orjson writers and processors. The processor should be able to handle single object files as well as arrays of objects or arrays of scalars. When lines is False the processor should load the full input into memory and then parse it. While this is not ideal, it is the most straightforward implementation. In addition, arrays of JSON shouldn't be used for large data, in that cases JSON lines should be used instead.

    The writer should write an array of objects when lines is False.

    core 
    opened by dcmoura 1
Releases(v0.9.0)
Owner
Daniel Moura
Daniel Moura
High level Python client for Elasticsearch

Elasticsearch DSL Elasticsearch DSL is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built o

elastic 3.6k Jan 03, 2023
aioodbc - is a library for accessing a ODBC databases from the asyncio

aioodbc aioodbc is a Python 3.5+ module that makes it possible to access ODBC databases with asyncio. It relies on the awesome pyodbc library and pres

aio-libs 253 Dec 31, 2022
Python PostgreSQL database performance insights. Locks, index usage, buffer cache hit ratios, vacuum stats and more.

Python PG Extras Python port of Heroku PG Extras with several additions and improvements. The goal of this project is to provide powerful insights int

Paweล‚ Urbanek 35 Nov 01, 2022
A tiny python web application based on Flask to set, get, expire, delete keys of Redis database easily with direct link at the browser.

First Redis Python (CRUD) A tiny python web application based on Flask to set, get, expire, delete keys of Redis database easily with direct link at t

Max Base 9 Dec 24, 2022
A Python DB-API and SQLAlchemy dialect to Google Spreasheets

Note: shillelagh is a drop-in replacement for gsheets-db-api, with many additional features. You should use it instead. If you're using SQLAlchemy all

Beto Dealmeida 185 Jan 01, 2023
New generation PostgreSQL database adapter for the Python programming language

Psycopg 3 -- PostgreSQL database adapter for Python Psycopg 3 is a modern implementation of a PostgreSQL adapter for Python. Installation Quick versio

The Psycopg Team 880 Jan 08, 2023
Tool for synchronizing clickhouse clusters

clicksync Tool for synchronizing clickhouse clusters works only with partitioned MergeTree tables can sync clusters with different node number uses in

Alexander Rumyantsev 1 Nov 30, 2021
Database connection pooler for Python

Nimue Strange women lying in ponds distributing swords is no basis for a system of government! --Dennis, Peasant Nimue is a database connection pool f

1 Nov 09, 2021
DataStax Python Driver for Apache Cassandra

DataStax Driver for Apache Cassandra A modern, feature-rich and highly-tunable Python client library for Apache Cassandra (2.1+) and DataStax Enterpri

DataStax 1.3k Dec 25, 2022
An extension package of ๐Ÿค— Datasets that provides support for executing arbitrary SQL queries on HF datasets

datasets_sql A ๐Ÿค— Datasets extension package that provides support for executing arbitrary SQL queries on HF datasets. It uses DuckDB as a SQL engine

Mario ล aลกko 19 Dec 15, 2022
Python client for Apache Kafka

Kafka Python client Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much like the offici

Dana Powers 5.1k Jan 08, 2023
A pythonic interface to Amazon's DynamoDB

PynamoDB A Pythonic interface for Amazon's DynamoDB. DynamoDB is a great NoSQL service provided by Amazon, but the API is verbose. PynamoDB presents y

2.1k Dec 30, 2022
Pure-python PostgreSQL driver

pg-purepy pg-purepy is a pure-Python PostgreSQL wrapper based on the anyio library. A lot of this library was inspired by the pg8000 library. Credits

Lura Skye 11 May 23, 2022
aiopg is a library for accessing a PostgreSQL database from the asyncio

aiopg aiopg is a library for accessing a PostgreSQL database from the asyncio (PEP-3156/tulip) framework. It wraps asynchronous features of the Psycop

aio-libs 1.3k Jan 03, 2023
A SQL linter and auto-formatter for Humans

The SQL Linter for Humans SQLFluff is a dialect-flexible and configurable SQL linter. Designed with ELT applications in mind, SQLFluff also works with

SQLFluff 5.5k Jan 08, 2023
A simple wrapper to make a flat file drop in raplacement for mongodb out of TinyDB

Purpose A simple wrapper to make a drop in replacement for mongodb out of tinydb. This module is an attempt to add an interface familiar to those curr

180 Jan 01, 2023
A HugSQL-inspired database library for Python

PugSQL PugSQL is a simple Python interface for using parameterized SQL, in files. See pugsql.org for the documentation. To install: pip install pugsql

Dan McKinley 558 Dec 24, 2022
Python MYSQL CheatSheet.

Python MYSQL CheatSheet Python mysql cheatsheet. Install Required Windows(WAMP) Download and Install from HERE Linux(LAMP) install packages. sudo apt

Mohammad Dori 4 Jul 15, 2022
Pystackql - Python wrapper for StackQL

pystackql - Python Library for StackQL Python wrapper for StackQL Usage from pys

StackQL Studios 6 Jul 01, 2022
#crypto #cipher #encode #decode #hash

๐ŸŒน CYPHER TOOLS ๐ŸŒน Written by TMRSWRR Version 1.0.0 All in one tools for CRYPTOLOGY. Instagram: Capture the Root ๐Ÿ–ผ๏ธ Screenshots ๐Ÿ–ผ๏ธ ๐Ÿ“น How to use ๐Ÿ“น

50 Dec 23, 2022