A non-validating SQL parser module for Python

Overview

python-sqlparse - Parse SQL statements

buildstatus coverage docs

sqlparse is a non-validating SQL parser for Python. It provides support for parsing, splitting and formatting SQL statements.

The module is compatible with Python 3.5+ and released under the terms of the New BSD license.

Visit the project page at https://github.com/andialbrecht/sqlparse for further information about this project.

Quick Start

$ pip install sqlparse
>>> import sqlparse

>>> # Split a string containing two SQL statements:
>>> raw = 'select * from foo; select * from bar;'
>>> statements = sqlparse.split(raw)
>>> statements
['select * from foo;', 'select * from bar;']

>>> # Format the first statement and print it out:
>>> first = statements[0]
>>> print(sqlparse.format(first, reindent=True, keyword_case='upper'))
SELECT *
FROM foo;

>>> # Parsing a SQL statement:
>>> parsed = sqlparse.parse('select * from foo')[0]
>>> parsed.tokens
[<DML 'select' at 0x7f22c5e15368>, <Whitespace ' ' at 0x7f22c5e153b0>, <Wildcard '*' … ]
>>>

Links

Project page
https://github.com/andialbrecht/sqlparse
Bug tracker
https://github.com/andialbrecht/sqlparse/issues
Documentation
https://sqlparse.readthedocs.io/
Online Demo
https://sqlformat.org/

sqlparse is licensed under the BSD license.

Parts of the code are based on pygments written by Georg Brandl and others. pygments-Homepage: http://pygments.org/

Issues
  • Capitalization formatting is correct when keyword is used without parentheses, incorrect only when used with parentheses.

    Capitalization formatting is correct when keyword is used without parentheses, incorrect only when used with parentheses.

    Capitalization formatting is correct when keyword is used without parentheses, incorrect only when used with parentheses.

    Ex: SELECT COUNT column_x FROM table_a; is formatted correctly, but SELECT count(column_x) FROM table_a; is formatted incorrectly.

    opened by pradipsparkale 0
  • `DIV` should be Operator

    `DIV` should be Operator

    DIV should be treated as Operator, but it isn't. https://www.w3schools.com/sql/func_mysql_div.asp

    This behavior prevents to get_alias with DIV operator.

    Actual behavior

    >>> sqlparse.parse('col1 DIV 5')[0]._pprint_tree()
    |- 0 Identifier 'col1 D...'
    |  |- 0 Name 'col1'
    |  |- 1 Whitespace ' '
    |  `- 2 Identifier 'DIV'
    |     `- 0 Name 'DIV'
    |- 1 Whitespace ' '
    `- 2 Integer '5'
    

    Expected behavior

    >>> sqlparse.parse('col1 DIV 5')[0]._pprint_tree()
    `- 0 Operation 'col1 D...'
       |- 0 Identifier 'col1'
       |  `- 0 Name 'col1'
       |- 1 Whitespace ' '
       |- 2 Operator 'DIV'
       |- 3 Whitespace ' '
       `- 4 Integer '5'
    
    opened by chezou 0
  • CREATE TABLE tbl AS SELECT should return get_alias() for its column

    CREATE TABLE tbl AS SELECT should return get_alias() for its column

    Currently, CREATE TABLE AS SELECT a.k.a. CTAS doesn't parse column aliases with functions e.g., coalesce, if, appropriately.

    In the following example, coalesce is parsed as Name while it should be Function to get get_alias() appropriately.

    Before

    >>> import sqlparse
    >>> p = sqlparse.parse('CREATE TABLE tbl1 AS SELECT coalesce(t1.col1, 0) AS col1 FROM t1')[0]
    >>> p._pprint_tree()
    |- 0 DDL 'CREATE'
    |- 1 Whitespace ' '
    |- 2 Keyword 'TABLE'
    |- 3 Whitespace ' '
    |- 4 Identifier 'tbl1'
    |  `- 0 Name 'tbl1'
    |- 5 Whitespace ' '
    |- 6 Keyword 'AS'
    |- 7 Whitespace ' '
    |- 8 DML 'SELECT'
    |- 9 Whitespace ' '
    |- 10 Identifier 'coales...'
    |  |- 0 Name 'coales...'
    |  `- 1 Identifier '(t1.co...'
    |     |- 0 Parenthesis '(t1.co...'
    |     |  |- 0 Punctuation '('
    |     |  |- 1 IdentifierList 't1.col...'
    |     |  |  |- 0 Identifier 't1.col1'
    |     |  |  |  |- 0 Name 't1'
    |     |  |  |  |- 1 Punctuation '.'
    |     |  |  |  `- 2 Name 'col1'
    |     |  |  |- 1 Punctuation ','
    |     |  |  |- 2 Whitespace ' '
    |     |  |  `- 3 Integer '0'
    |     |  `- 2 Punctuation ')'
    |     |- 1 Whitespace ' '
    |     |- 2 Keyword 'AS'
    |     |- 3 Whitespace ' '
    |     `- 4 Identifier 'col1'
    |        `- 0 Name 'col1'
    |- 11 Whitespace ' '
    |- 12 Keyword 'FROM'
    |- 13 Whitespace ' '
    `- 14 Identifier 't1'
       `- 0 Name 't1'
    >>> p.tokens[10].get_alias()
    # Should return 'col1'
    

    After

    >>> p = sqlparse.parse('CREATE TABLE tbl1 AS SELECT coalesce(t1.col1, 0) AS col1 FROM t1')[0]
    >>> p._pprint_tree()
    |- 0 DDL 'CREATE'
    |- 1 Whitespace ' '
    |- 2 Keyword 'TABLE'
    |- 3 Whitespace ' '
    |- 4 Identifier 'tbl1'
    |  `- 0 Name 'tbl1'
    |- 5 Whitespace ' '
    |- 6 Keyword 'AS'
    |- 7 Whitespace ' '
    |- 8 DML 'SELECT'
    |- 9 Whitespace ' '
    |- 10 Identifier 'coales...'
    |  |- 0 Function 'coales...'
    |  |  |- 0 Identifier 'coales...'
    |  |  |  `- 0 Name 'coales...'
    |  |  `- 1 Parenthesis '(t1.co...'
    |  |     |- 0 Punctuation '('
    |  |     |- 1 IdentifierList 't1.col...'
    |  |     |  |- 0 Identifier 't1.col1'
    |  |     |  |  |- 0 Name 't1'
    |  |     |  |  |- 1 Punctuation '.'
    |  |     |  |  `- 2 Name 'col1'
    |  |     |  |- 1 Punctuation ','
    |  |     |  |- 2 Whitespace ' '
    |  |     |  `- 3 Integer '0'
    |  |     `- 2 Punctuation ')'
    |  |- 1 Whitespace ' '
    |  |- 2 Keyword 'AS'
    |  |- 3 Whitespace ' '
    |  `- 4 Identifier 'col1'
    |     `- 0 Name 'col1'
    |- 11 Whitespace ' '
    |- 12 Keyword 'FROM'
    |- 13 Whitespace ' '
    `- 14 Identifier 't1'
       `- 0 Name 't1'
    >>> p.tokens[10].get_alias()
    'col1'
    
    opened by chezou 0
  • Behavior with reindent=True and reindent_aligned=True (should these be mutually exclusive?)

    Behavior with reindent=True and reindent_aligned=True (should these be mutually exclusive?)

    Hello,

    I recently started using sqlparse and love it, it's made my life a lot easier, especially in conjunction with a little helper function to work with SQLAlchemy statements that binds the parameters as literals. I did however come across one strange behavior

    I've been debating whether it's worthy of "fixing", and decided to let you make that call. Regardless, I'm happy to provide the "fix" (or enhancement/guardrail/bulletproof shoes)

    Description

    After testing a bit, and reading the description of the various format() kwargs carefully, I think I may have figured the issue out. I believe it may be "pilot error" - me using it incorrectly

    However, I also believe it may be beneficial to other users (to avoid mistakes, and future noise on GH Issues) to add a check inside sqlparse.format() to ensure nobody else falls into this pit. This is of course your decision, generally I don't advocate for too many guardrails around well documented interfaces, but for reasons I can't properly explain, it seems (to me) it may make sense in this case

    Here's the behavior I see- the first two are as expected, the third caught me off guard. I (unfortunately) used the third when I started using sqlparse, which is how I came across this. It's not a blocker since case # 2 is the exact behavior I want

    First, the code just so you see it's not doing anything extraordinarily silly, then the output, with modifications to only the reindent_aligned and reindent keyword arguments

    Test Code

    query_string = """select
    table_test.id, table_test.fqdn_root, table_test.modified_ts,
    table_test.modified_by,
    table_test.active,
    table_test.analyst_notes,
    table_test.business,
    table_test.contacts FROM bah b JOIN c_table c ON lol = 123, heh = 456
    WHERE table_test.active is true and not lol is null"""
    fmt_kwargs = {
        "keyword_case": "upper",
        "identifier_case": "lower",
        "reindent": False,
        "wrap_after": 80,
        "reindent_aligned": True,
    }
    print('--- BEGIN INPUT QUERY ---')
    print(query_string)
    print('--- END INPUT QUERY ---\n')
    print('sqlparse.format() kwargs:')
    print()
    print('--- sqlparse.format() keyword arguments ---')
    for k, v in fmt_kwargs.items():
        print(f'  {k}={v}')
    print()
    print('--- sqlparse.format() output ---')
    pretty_query = sqlparse.format(query_string, **fmt_kwargs)
    print(pretty_query)
    print()
    

    Now for the test cases, where the fmt_kwargs are fiddled slightly

    Case 1: Using reindent=True, reindent_aligned=False (EXPECTED/OK)

    This is as expected and very pretty

    --- BEGIN INPUT QUERY ---
    select
    table_test.id, table_test.fqdn_root, table_test.modified_ts,
    table_test.modified_by,
    table_test.active,
    table_test.analyst_notes,
    table_test.business,
    table_test.contacts FROM bah b JOIN c_table c ON lol = 123, heh = 456
    WHERE table_test.active is true and not lol is null
    --- END INPUT QUERY ---
    
    sqlparse.format() kwargs:
    
    --- sqlparse.format() keyword arguments ---
      keyword_case=upper
      identifier_case=lower
      reindent=True
      wrap_after=80
      reindent_aligned=False
    
    --- sqlparse.format() output ---
    SELECT table_test.id, table_test.fqdn_root, table_test.modified_ts, table_test.modified_by,
           table_test.active, table_test.analyst_notes, table_test.business, table_test.contacts
    FROM bah b
    JOIN c_table c ON lol = 123, heh = 456
    WHERE table_test.active IS TRUE
      AND NOT lol IS NULL
    

    Case 2: Using reindent=False, reindent_aligned=True (EXPECTED/OK)

    This is as expected and very pretty

    --- BEGIN INPUT QUERY ---
    select
    table_test.id, table_test.fqdn_root, table_test.modified_ts,
    table_test.modified_by,
    table_test.active,
    table_test.analyst_notes,
    table_test.business,
    table_test.contacts FROM bah b JOIN c_table c ON lol = 123, heh = 456
    WHERE table_test.active is true and not lol is null
    --- END INPUT QUERY ---
    
    sqlparse.format() kwargs:
    
    --- sqlparse.format() keyword arguments ---
      keyword_case=upper
      identifier_case=lower
      reindent=False
      wrap_after=80
      reindent_aligned=True
    
    --- sqlparse.format() output ---
    SELECT table_test.id,
           table_test.fqdn_root,
           table_test.modified_ts,
           table_test.modified_by,
           table_test.active,
           table_test.analyst_notes,
           table_test.business,
           table_test.contacts
      FROM bah b
      JOIN c_table c
        ON lol = 123,
           heh = 456
     WHERE table_test.active IS TRUE
       AND NOT lol IS NULL
    

    Case 3: Using both reindent=True, reindent_aligned=True (UNEXPECTED)

    This is not what I initially expected, and as you can see is quite ugly. What is essentially happening as far as I can tell is (because both reindent=True and reindent_aligned=True, the wrap_after value is being applied vertically rather than horizontally, which results in blank lines every <wrap_after> characters. You can see the blank line after the table_test.modified_by column as an example. There also seem to be blank lines between each keyword grouping:

    --- BEGIN INPUT QUERY ---
    select
    table_test.id, table_test.fqdn_root, table_test.modified_ts,
    table_test.modified_by,
    table_test.active,
    table_test.analyst_notes,
    table_test.business,
    table_test.contacts FROM bah b JOIN c_table c ON lol = 123, heh = 456
    WHERE table_test.active is true and not lol is null
    --- END INPUT QUERY ---
    
    sqlparse.format() kwargs:
    
    --- sqlparse.format() keyword arguments ---
      keyword_case=upper
      identifier_case=lower
      reindent=True
      wrap_after=80
      reindent_aligned=True
    
    --- sqlparse.format() output ---
    SELECT table_test.id,
           table_test.fqdn_root,
           table_test.modified_ts,
           table_test.modified_by,
                                                          <----- undesired/unexpected
           table_test.active,
           table_test.analyst_notes,
           table_test.business,
           table_test.contacts
                                                          <----- undesired/unexpected
      FROM bah b
                                                          <----- undesired/unexpected
      JOIN c_table c
        ON lol = 123,
           heh = 456
                                                          <----- undesired/unexpected
     WHERE table_test.active IS TRUE
                                                          <----- undesired/unexpected
       AND NOT lol IS NULL
    

    Summary

    From what I can tell, reindent and reindent_aligned are mutually exclusive- if both are True, the output is ugly and irregular. So the problem is solved- I'm using the interface incorrectly, I think. Setting only reindent_aligned=True and leaving reindent=False produces what I wanted

    However, I'm proposing it might make sense to choose one of these:

    1. Explicitly state this in the documentation
    2. Add a sanity check in format() to raise a ValueError (or whichever exception you find most appropriate) stating that the two are incompatible
    3. If reindent=True and reindent_aligned=True, behave as if only reindent_aligned was set, since that's probably what the caller meant?

    IF you agree that this should be documented or checked, I'd be happy to send a PR, since it's not exactly a critical enhancement, and I'm sure you have better things to be working on :)

    Thanks again for your work on this project and please feel free to close this out with a short explanation (or no explanation) if you feel it's an unreasonable ask, if I have it wrong, or any other reason you see fit

    I apologize in advanced if this is clearly documented somewhere that I managed to miss

    opened by mzpqnxow 0
Owner
Andi Albrecht
Andi Albrecht
Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files

cdvpp Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files Reads a Digital Vaccination Pass PDF file as in

Esteban Borai 1 Nov 17, 2021
This is REST-API for Indonesian Text Summarization using Non-Negative Matrix Factorization for the algorithm to summarize documents and FastAPI for the framework.

Indonesian Text Summarization Using FastAPI This is REST-API for Indonesian Text Summarization using Non-Negative Matrix Factorization for the algorit

Viqi Nurhaqiqi 0 Sep 7, 2021
The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity

Contents Maintainer wanted Introduction Installation Documentation License History Source code Authors Maintainer wanted I am looking for a new mainta

Antti Haapala 1.1k May 16, 2022
A simple Python module for parsing human names into their individual components

Name Parser A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components. hn.title hn.first hn.middle hn.last hn.suff

Derek Gulbranson 539 May 11, 2022
PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different languages

PyMultiDictionary PyMultiDictionary is a Dictionary Module for Python 3+ to get meanings, translations, synonyms and antonyms of words in 20 different

Pablo Pizarro R. 10 May 15, 2022
Implementation of hashids (http://hashids.org) in Python. Compatible with Python 2 and Python 3

hashids for Python 2.7 & 3 A python port of the JavaScript hashids implementation. It generates YouTube-like hashes from one or many numbers. Use hash

David Aurelio 1.3k May 21, 2022
Python character encoding detector

Chardet: The Universal Character Encoding Detector Detects ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants) Big5, GB2312, EUC-TW, HZ-GB-2312, IS

Character Encoding Detector 1.7k May 18, 2022
Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

SeatGeek 8.7k May 16, 2022
Paranoid text spacing in Python

pangu.py Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width charact

Vinta Chen 185 Apr 8, 2022
An implementation of figlet written in Python

All of the documentation and the majority of the work done was by Christopher Jones ([email protected]). Packaged by Peter Waller <[email protected]>,

Peter Waller 978 May 18, 2022
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Life4 2.8k May 16, 2022
Python flexible slugify function

awesome-slugify Python flexible slugify function PyPi: https://pypi.python.org/pypi/awesome-slugify Github: https://github.com/dimka665/awesome-slugif

Dmitry Voronin 465 May 23, 2022
Python Lex-Yacc

PLY (Python Lex-Yacc) Copyright (C) 2001-2020 David M. Beazley (Dabeaz LLC) All rights reserved. Redistribution and use in source and binary forms, wi

David Beazley 2.2k May 17, 2022
Python library for creating PEG parsers

PyParsing -- A Python Parsing Module Introduction The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the t

Pyparsing 1.5k May 17, 2022
Python port of Google's libphonenumber

phonenumbers Python Library This is a Python port of Google's libphonenumber library It supports Python 2.5-2.7 and Python 3.x (in the same codebase,

David Drysdale 3k May 14, 2022
A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

Python User Agents user_agents is a Python library that provides an easy way to identify/detect devices like mobile phones, tablets and their capabili

Selwin Ong 1.3k May 20, 2022
An anthology of a variety of tools for the Persian language in Python

An anthology of a variety of tools for the Persian language in Python

Persian Tools 86 May 9, 2022
Widevine KEY Extractor in Python

Widevine Client 3 This was originally written by T3rry7f. This repo is slightly modified version of his repo. This only works on standard Windows! Usa

Vank0n (SJJeon) 54 Apr 13, 2022
A Python app which can convert normal text to Handwritten text.

Text to HandWritten Text ✍️ Converter Watch Tutorial for this project Usage:- Clone my repository. Open CMD in working directory. Run following comman

Kushal Bhavsar 4 Apr 1, 2022