Python Lex-Yacc

Related tags

Text Processingply
Overview

PLY (Python Lex-Yacc)

Copyright (C) 2001-2020 David M. Beazley (Dabeaz LLC) All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of the David Beazley or Dabeaz LLC may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Introduction

PLY is a 100% Python implementation of the common parsing tools lex and yacc. Here are a few highlights:

  • PLY is very closely modeled after traditional lex/yacc. If you know how to use these tools in C, you will find PLY to be similar.

  • PLY provides very extensive error reporting and diagnostic information to assist in parser construction. The original implementation was developed for instructional purposes. As a result, the system tries to identify the most common types of errors made by novice users.

  • PLY provides full support for empty productions, error recovery, precedence specifiers, and moderately ambiguous grammars.

  • Parsing is based on LR-parsing which is fast, memory efficient, better suited to large grammars, and which has a number of nice properties when dealing with syntax errors and other parsing problems. Currently, PLY builds its parsing tables using the LALR(1) algorithm used in yacc.

  • PLY uses Python introspection features to build lexers and parsers.
    This greatly simplifies the task of parser construction since it reduces the number of files and eliminates the need to run a separate lex/yacc tool before running your program.

  • PLY can be used to build parsers for "real" programming languages. Although it is not ultra-fast due to its Python implementation, PLY can be used to parse grammars consisting of several hundred rules (as might be found for a language like C). The lexer and LR parser are also reasonably efficient when parsing typically sized programs. People have used PLY to build parsers for C, C++, ADA, and other real programming languages.

How to Use

PLY consists of two files : lex.py and yacc.py. These are contained within the ply directory which may also be used as a Python package. To use PLY, simply copy the ply directory to your project and import lex and yacc from the associated ply package. For example:

from .ply import lex
from .ply import yacc

Alternatively, you can copy just the files lex.py and yacc.py individually and use them as modules however you see fit. For example:

import lex
import yacc

If you wish, you can use the install.py script to install PLY into virtual environment.

PLY has no third-party dependencies.

The docs/ directory contains complete documentation on how to use the system. Documentation available at https://ply.readthedocs.io

The example directory contains several different examples including a PLY specification for ANSI C as given in K&R 2nd Ed.

A simple example is found at the end of this document

Requirements

PLY requires the use of Python 3.6 or greater. However, you should use the latest Python release if possible. It should work on just about any platform.

Note: PLY does not support execution under python -OO. It can be made to work in that mode, but you'll need to change the programming interface with a decorator. See the documentation for details.

Resources

Official Documentation is available at:

More information about PLY can be obtained on the PLY webpage at:

For a detailed overview of parsing theory, consult the excellent book "Compilers : Principles, Techniques, and Tools" by Aho, Sethi, and Ullman. The topics found in "Lex & Yacc" by Levine, Mason, and Brown may also be useful.

The GitHub page for PLY can be found at:

Acknowledgments

A special thanks is in order for all of the students in CS326 who suffered through about 25 different versions of these tools :-).

The CHANGES file acknowledges those who have contributed patches.

Elias Ioup did the first implementation of LALR(1) parsing in PLY-1.x. Andrew Waters and Markus Schoepflin were instrumental in reporting bugs and testing a revised LALR(1) implementation for PLY-2.0.

Example

Here is a simple example showing a PLY implementation of a calculator with variables.

# -----------------------------------------------------------------------------
# calc.py
#
# A simple calculator with variables.
# -----------------------------------------------------------------------------

tokens = (
    'NAME','NUMBER',
    'PLUS','MINUS','TIMES','DIVIDE','EQUALS',
    'LPAREN','RPAREN',
    )

# Tokens

t_PLUS    = r'\+'
t_MINUS   = r'-'
t_TIMES   = r'\*'
t_DIVIDE  = r'/'
t_EQUALS  = r'='
t_LPAREN  = r'\('
t_RPAREN  = r'\)'
t_NAME    = r'[a-zA-Z_][a-zA-Z0-9_]*'

def t_NUMBER(t):
    r'\d+'
    t.value = int(t.value)
    return t

# Ignored characters
t_ignore = " \t"

def t_newline(t):
    r'\n+'
    t.lexer.lineno += t.value.count("\n")

def t_error(t):
    print(f"Illegal character {t.value[0]!r}")
    t.lexer.skip(1)

# Build the lexer
import ply.lex as lex
lex.lex()

# Precedence rules for the arithmetic operators
precedence = (
    ('left','PLUS','MINUS'),
    ('left','TIMES','DIVIDE'),
    ('right','UMINUS'),
    )

# dictionary of names (for storing variables)
names = { }

def p_statement_assign(p):
    'statement : NAME EQUALS expression'
    names[p[1]] = p[3]

def p_statement_expr(p):
    'statement : expression'
    print(p[1])

def p_expression_binop(p):
    '''expression : expression PLUS expression
                  | expression MINUS expression
                  | expression TIMES expression
                  | expression DIVIDE expression'''
    if p[2] == '+'  : p[0] = p[1] + p[3]
    elif p[2] == '-': p[0] = p[1] - p[3]
    elif p[2] == '*': p[0] = p[1] * p[3]
    elif p[2] == '/': p[0] = p[1] / p[3]

def p_expression_uminus(p):
    'expression : MINUS expression %prec UMINUS'
    p[0] = -p[2]

def p_expression_group(p):
    'expression : LPAREN expression RPAREN'
    p[0] = p[2]

def p_expression_number(p):
    'expression : NUMBER'
    p[0] = p[1]

def p_expression_name(p):
    'expression : NAME'
    try:
        p[0] = names[p[1]]
    except LookupError:
        print(f"Undefined name {p[1]!r}")
        p[0] = 0

def p_error(p):
    print(f"Syntax error at {p.value!r}")

import ply.yacc as yacc
yacc.yacc()

while True:
    try:
        s = input('calc > ')
    except EOFError:
        break
    yacc.parse(s)

Bug Reports and Patches

My goal with PLY is to simply have a decent lex/yacc implementation for Python. As a general rule, I don't spend huge amounts of time working on it unless I receive very specific bug reports and/or patches to fix problems. At this time, PLY is mature software and new features are no longer being added. If you think you have found a bug, please visit the PLY Github page at https://github.com/dabeaz/ply to report an issue.

Take a Class!

If you'd like to learn more about compiler principles and have a go at implementing a compiler, come take a course. https://www.dabeaz.com/compiler.html.

-- Dave

Owner
David Beazley
Author of the Python Essential Reference (Addison-Wesley), Python Cookbook (O'Reilly), and former computer science professor. Come take a class!
David Beazley
Map Reduce Wordcount in Python using gRPC

This project is implemented in Python using gRPC. The input files are given in .txt format and the word count operation is performed.

Divija 4 Dec 05, 2022
Python tool to make adding to your armory spreadsheet armory less of a pain.

Python tool to make adding to your armory spreadsheet armory slightly less of a pain by creating a CSV to simply copy and paste.

1 Oct 20, 2021
Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

SeatGeek 8.8k Jan 08, 2023
CowExcept - Spice up those exceptions with cowexcept!

CowExcept - Spice up those exceptions with cowexcept!

James Ansley 41 Jun 30, 2022
A generator library for concise, unambiguous and URL-safe UUIDs.

Description shortuuid is a simple python library that generates concise, unambiguous, URL-safe UUIDs. Often, one needs to use non-sequential IDs in pl

Stavros Korokithakis 1.8k Dec 31, 2022
Implementation of hashids (http://hashids.org) in Python. Compatible with Python 2 and Python 3

hashids for Python 2.7 & 3 A python port of the JavaScript hashids implementation. It generates YouTube-like hashes from one or many numbers. Use hash

David Aurelio 1.4k Jan 02, 2023
Word and phrase lists in CSV

Word Lists Word and phrase lists in CSV, collected from different sources. Oxford Word Lists: oxford-5k.csv - Oxford 3000 and 5000 oxford-opal.csv - O

Anton Zhiyanov 14 Oct 14, 2022
Search for terms(word / table / field name or any) under Snowflake schema names

snowflake-search-terms-in-ddl-views Search for terms(word / table / field name or any) under Snowflake schema names Version : 1.0v How to use ? Run th

Igal Emona 1 Dec 15, 2021
Returns unicode slugs

Python Slugify A Python slugify application that handles unicode. Overview Best attempt to create slugs from unicode strings while keeping it DRY. Not

Val Neekman 1.3k Jan 04, 2023
a python package that lets you add custom colors and text formatting to your scripts in a very easy way!

colormate Python script text formatting package What is colormate? colormate is a python library that lets you add text formatting to your scripts, it

Rodrigo 2 Dec 14, 2022
Extract price amount and currency symbol from a raw text string

price-parser is a small library for extracting price and currency from raw text strings.

Scrapinghub 252 Dec 31, 2022
An implementation of figlet written in Python

All of the documentation and the majority of the work done was by Christopher Jones ([emai

Peter Waller 1.1k Jan 02, 2023
Production First and Production Ready End-to-End Keyword Spotting Toolkit

WeKws Production First and Production Ready End-to-End Keyword Spotting Toolkit. The goal of this toolkit it to... Small footprint keyword spotting (K

222 Dec 30, 2022
Athens: a great tool for taking notes and organising knowldge

AthensSyncer Athens is a great tool for taking notes and organising knowldge. But it is a bummer that you cannot use it accross multiple devices. Well

6 Dec 14, 2022
Add your new words to a text file and get them randomly.

Memorize-New-Words In this very very very little project, I've wrote a code to memorize new english words. Therefore you can add the words and their m

Mostafa 2 Jul 04, 2022
Convert English text to IPA using the toPhonetic

Installation: Windows python -m pip install text2ipa macOS sudo pip3 install text2ipa Linux pip install text2ipa Features Convert English text to I

Joseph Quang 3 Jun 14, 2022
Python Lex-Yacc

PLY (Python Lex-Yacc) Copyright (C) 2001-2020 David M. Beazley (Dabeaz LLC) All rights reserved. Redistribution and use in source and binary forms, wi

David Beazley 2.4k Dec 31, 2022
A slugifier that works in unicode

Unicode Slugify Unicode Slugify is a slugifier that generates unicode slugs. It was originally used in the Firefox Add-ons web site to generate slugs

Mozilla 315 Nov 21, 2022
Translate .sbv subtitle files

deepl4subtitle Deeplを使って字幕ファイル(.sbv)を翻訳します。タイムスタンプも含めて出力しますが、翻訳時はタイムスタンプは文の一部とは切り離されるので、.sbvファイルをそのまま翻訳機に突っ込むよりも高精度な翻訳ができるはずです。 つかいかた 入力する.sbvファイルの前処理

Yasunori Toshimitsu 1 Oct 20, 2021
Paranoid text spacing in Python

pangu.py Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width charact

Vinta Chen 194 Nov 19, 2022