Python Eacc is a minimalist but flexible Lexer/Parser tool in Python.

Related tags

Documentationeacc
Overview

Eacc

Python Eacc is a parsing tool it implements a flexible lexer and a straightforward approach to analyze documents. It uses Python code to specify both lexer and grammar for a given document. Eacc can handle succinctly most parsing cases that existing Python parsing tools propose to address.

Documents are split into tokens and a token has a type when a sequence of tokens is matched it evaluates to a specific type then rematcned again against the existing rules. The types can be function objects it means patterns can be evaluated based on extern conditions.

The fact of it being possible to have a grammar rule associated to a type and the type being variable in the context of the program it makes eacc useful for some text analysis problems.

A document grammar is written mostly in an ambiguous manner. The parser has a lookahead mechanism to express precedence when matching rules.

It is possible to extend the document grammar at the time it is being parsed. Such a feature is interesting to handle some edge cases.

The parser also accept some special operators like Except, Only, Times etc. These operators are used to match sequences of tokens based on their token types and length.

Features

  • Fast and flexible Lexer

    • Use class inheritance to extend/modify your existing lexers.
  • Handle broken documents.

    • Useful in some edge cases.
  • Short implementation

    • You can easily extend or modify functionalities.
  • Powerful but easy to learn

    • Learn a few classes workings to implement a parser.
  • Pythonic notation for grammars

    • No need to dig deep into grammar theory.

Note: For a real and more sophisticated example of eacc usage check out.

Crocs is capable of reading a regex string then generating possible matches for the inputed regex.

https://github.com/iogf/crocs

Basic Example

The code below specifies a lexer and a parsing approach for a simple expression calculator. When one of the mathematical operations +, -, * or / is executed then the result is a number

Based on such a simple assertion it is possible to implement our calculator.

from eacc.eacc import Rule, Grammar, Eacc
from eacc.lexer import Lexer, LexTok, XSpec
from eacc.token import Plus, Minus, LP, RP, Mul, Div, Num, Blank, Sof, Eof

class CalcTokens(XSpec):
    # Used to extract the tokens.
    t_plus   = LexTok(r'\+', Plus)
    t_minus  = LexTok(r'\-', Minus)

    t_lparen = LexTok(r'\(', LP)
    t_rparen = LexTok(r'\)', RP)
    t_mul    = LexTok(r'\*', Mul)
    t_div    = LexTok(r'\/', Div)

    t_num    = LexTok(r'[0-9]+', Num, float)
    t_blank  = LexTok(r' +', Blank, discard=True)

    root = [t_plus, t_minus, t_lparen, t_num, 
    t_blank, t_rparen, t_mul, t_div]

class CalcGrammar(Grammar):
    # The token patterns when matched them become
    # ParseTree objects which have a type.
    r_paren = Rule(LP, Num, RP, type=Num)
    r_div   = Rule(Num, Div, Num, type=Num)
    r_mul   = Rule(Num, Mul, Num, type=Num)
    o_div   = Rule(Div)
    o_mul   = Rule(Mul)

    r_plus  = Rule(Num, Plus, Num, type=Num, up=(o_mul, o_div))
    r_minus = Rule(Num, Minus, Num, type=Num, up=(o_mul, o_div))

    # The final structure that is consumed. Once it is
    # consumed then the process stops.
    r_done  = Rule(Sof, Num, Eof)

    root = [r_paren, r_plus, r_minus, r_mul, r_div, r_done]

# The handles mapped to the patterns to compute the expression result.
def plus(expr, sign, term):
    return expr.val() + term.val()

def minus(expr, sign, term):
    return expr.val() - term.val()

def div(term, sign, factor):
    return term.val()/factor.val()

def mul(term, sign, factor):
    return term.val() * factor.val()

def paren(left, expression, right):
    return expression.val()

def done(sof, num, eof):
    print('Result:', num.val())
    return num.val()

if __name__ == '__main__':
    data = '2 * 5 + 10 -(2 * 3 - 10 )+ 30/(1-3+ 4* 10 + (11/1))' 

    lexer  = Lexer(CalcTokens)
    tokens = lexer.feed(data)
    eacc   = Eacc(CalcGrammar)
    
    # Link the handles to the patterns.
    eacc.add_handle(CalcGrammar.r_plus, plus)
    eacc.add_handle(CalcGrammar.r_minus, minus)
    eacc.add_handle(CalcGrammar.r_div, div)
    eacc.add_handle(CalcGrammar.r_mul, mul)
    eacc.add_handle(CalcGrammar.r_paren, paren)
    eacc.add_handle(CalcGrammar.r_done, done)
    
    ptree = eacc.build(tokens)
    ptree = list(ptree)

The defined rule below fixes precedence in the above ambiguous grammar.

    r_plus  = Rule(Num, Plus, Num, type=Num, up=(o_mul, o_div))

The above rule will be matched only if the below rules aren't matched ahead.

    o_div   = Rule(Div)
    o_mul   = Rule(Mul)

In case the above rule is matched then the result has type Num it will be rematched against the existing rules and so on.

When a mathematical expression is well formed it will result to the following structure.

Sof Num Eof

Which is matched by the rule below.

    r_done  = Rule(Sof, Num, Eof)

That rule is mapped to the handle below. It will merely print the resulting value.

def done(sof, num, eof):
    print('Result:', num.val())
    return num.val()

The Sof and Eof are start of file and end of file tokens. These are automatically inserted by the parser.

In case it is not a valid mathematical expression then it raises an exception. When a given document is well formed, the defined rules will consume it entirely.

The lexer is really flexible it can handle some interesting cases in a short and simple manner.

from eacc.lexer import XSpec, Lexer, SeqTok, LexTok, LexSeq
from eacc.token import Keyword, Identifier, RP, LP, Colon, Blank

class KeywordTokens(XSpec):
    t_if = LexSeq(SeqTok(r'if', type=Keyword),
    SeqTok(r'\s+', type=Blank))

    t_blank  = LexTok(r' +', type=Blank)
    t_lparen = LexTok(r'\(', type=LP)
    t_rparen = LexTok(r'\)', type=RP)
    t_colon  = LexTok(r'\:', type=Colon)

    # Match identifier only if it is not an if.
    t_identifier = LexTok(r'[a-zA-Z0-9]+', type=Identifier)

    root = [t_if, t_blank, t_lparen, 
    t_rparen, t_colon, t_identifier]

lex = Lexer(KeywordTokens)
data = 'if ifnum: foobar()'
tokens = lex.feed(data)
print('Consumed:', list(tokens))

That would output:

Consumed: [Keyword('if'), Blank(' '), Identifier('ifnum'), Colon(':'),
Blank(' '), Identifier('foobar'), LP('('), RP(')')]

The above example handles the task of tokenizing keywords correctly. The SeqTok class works together with LexSeq to extract the tokens based on a given regex while LexNode works on its own to extract tokens that do not demand a lookahead step.

Install

Note: Work with python3 only.

pip install eacc

Documentation

You might also like...
Sms Bomber, Tool Encryptor
Sms Bomber, Tool Encryptor

ɴᴏʙɪᴛᴀシ︎ ғᴏʀ ᴀɴʏ ʜᴇʟᴘシ︎ Install pkg install git -y pkg install python -y pip install requests git clone https://github.com/AK27HVAU/akash Run cd Akash

JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates.
JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates.

JTEX JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates. This package uses Jinja2 as the template engine with

Żmija is a simple universal code generation tool.

Żmija Żmija is a simple universal code generation tool. It is intended to be used as a means to generate code that is both efficient and easily mainta

epub2sphinx is a tool to convert epub files to ReST for Sphinx
epub2sphinx is a tool to convert epub files to ReST for Sphinx

epub2sphinx epub2sphinx is a tool to convert epub files to ReST for Sphinx. It uses Pandoc for converting HTML data inside epub files into ReST. It cr

Sphinx-performance - CLI tool to measure the build time of different, free configurable Sphinx-Projects
Sphinx-performance - CLI tool to measure the build time of different, free configurable Sphinx-Projects

CLI tool to measure the build time of different, free configurable Sphinx-Projec

A collection of simple python mini projects to enhance your python skills

A collection of simple python mini projects to enhance your python skills

Repository for learning Python (Python Tutorial)

Repository for learning Python (Python Tutorial) Languages and Tools 🧰 Overview 📑 Repository for learning Python (Python Tutorial) Languages and Too

A python package to avoid writing and maintaining duplicated python docstrings.

docstring-inheritance is a python package to avoid writing and maintaining duplicated python docstrings.

advance python series: Data Classes, OOPs, python

Working With Pydantic - Built-in Data Process ========================== Normal way to process data (reading json file): the normal princiople, it's f

Releases(v3.1.6)
Owner
Iury de oliveira gomes figueiredo
Iury de oliveira gomes figueiredo
MkDocs Plugin allowing your visitors to *File > Print > Save as PDF* the entire site.

mkdocs-print-site-plugin MkDocs plugin that adds a page to your site combining all pages, allowing your site visitors to File Print Save as PDF th

Tim Vink 67 Jan 04, 2023
Markdown documentation generator from Google docstrings

mkgendocs A Python package for automatically generating documentation pages in markdown for Python source files by parsing Google style docstring. The

Davide Nunes 44 Dec 18, 2022
Bring RGB to life in Neovim

Bring RGB to life in Neovim Change your RGB devices' color depending on Neovim's mode. Fast and asynchronous plugin to live your vim-life to the fulle

Antoine 40 Oct 27, 2022
My Sublime Text theme

rsms sublime text theme Install: cd path/to/your/sublime/packages git clone https://github.com/rsms/sublime-theme.git rsms-theme You'll also need the

Rasmus 166 Jan 04, 2023
An open-source script written in python just for fun

Owersite Owersite is an open-source script written in python just for fun. It do

大きなペニスを持つ少年 7 Sep 21, 2022
Legacy python processor for AsciiDoc

AsciiDoc.py This branch is tracking the alpha, in-progress 10.x release. For the stable 9.x code, please go to the 9.x branch! AsciiDoc is a text docu

AsciiDoc.py 178 Dec 25, 2022
Fast syllable estimation library based on pattern matching.

Syllables: A fast syllable estimator for Python Syllables is a fast, simple syllable estimator for Python. It's intended for use in places where speed

ProseGrinder 26 Dec 14, 2022
30 days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days

30 days of Python programming challenge is a step-by-step guide to learn the Python programming language in 30 days. This challenge may take more than100 days, follow your own pace.

Asabeneh 17.7k Jan 07, 2023
swagger-codegen contains a template-driven engine to generate documentation, API clients and server stubs in different languages by parsing your OpenAPI / Swagger definition.

Master (2.4.25-SNAPSHOT): 3.0.31-SNAPSHOT: Maven Central ⭐ ⭐ ⭐ If you would like to contribute, please refer to guidelines and a list of open tasks. ⭐

Swagger 15.2k Dec 31, 2022
Collections of Beautiful Latex Snippets

HandyLatex Collections of Beautiful Latex Snippets Table 👉 Succinct table with bold separation line and gray text %################## Dependencies ##

Xintao 15 Apr 11, 2022
Reproducible Data Science at Scale!

Pachyderm: The Data Foundation for Machine Learning Pachyderm provides the data layer that allows machine learning teams to productionize and scale th

Pachyderm 5.7k Dec 29, 2022
Plugins for MkDocs.

Plugins for MkDocs and Python Markdown pip install neoteroi-mkdocs This package includes the following plugins and extensions: Name Description Type m

35 Dec 23, 2022
script to calculate total GPA out of 4, based on input gpa.csv

gpa_calculator script to calculate total GPA out of 4 based on input gpa.csv to use, create a total.csv file containing only one integer showing the t

Mohamad Bastin 1 Feb 07, 2022
Showing potential issues with merge strategies

Showing potential issues with merge strategies Context There are two branches in this repo: main and a feature branch feat/inverting-method (not the b

Rubén 2 Dec 20, 2021
Proyecto - Desgaste y rendimiento de empleados de IBM HR Analytics

Acceder al código desde Google Colab para poder ver de manera adecuada todas las visualizaciones y poder interactuar con ellas. Links de acceso: Noteb

1 Jan 31, 2022
The OpenAPI Specification Repository

The OpenAPI Specification The OpenAPI Specification is a community-driven open specification within the OpenAPI Initiative, a Linux Foundation Collabo

OpenAPI Initiative 25.5k Dec 29, 2022
JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates.

JTEX JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates. This package uses Jinja2 as the template engine with

Curvenote 15 Dec 21, 2022
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, A

Donne Martin 24.5k Jan 09, 2023
CoderByte | Practice, Tutorials & Interview Preparation Solutions|

CoderByte | Practice, Tutorials & Interview Preparation Solutions This repository consists of solutions to CoderByte practice, tutorials, and intervie

Eda AYDIN 6 Aug 09, 2022
Code for our SIGIR 2022 accepted paper : P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Learning and Pre-finetuning

P3 Ranker Implementation for our SIGIR2022 accepted paper: P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-bas

14 Jan 04, 2023