LightCSV - This CSV reader is implemented in just pure Python.

Last update: Mar 05, 2022

Related tags

Overview

LightCSV

Simple light CSV reader

This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles (or get the first row as titles). Nothing more, nothing else.

Usage

Usage is pretty straightforward:

from lightcsv import LightCSV

for row in LightCSV().read_file("myfile.csv"):
    print(row)

This will open a file named myfile.csv and iterate over the CSV file returning each row as a key-value dictionary. Line endings can be either \n or \r\n. The file will be opened in text-mode with utf-8 encoding.

You can supply your own stream (i.e. an open file instead of a filename). You can use this, for example, to open a file with a different encoding, etc.:

from lightcsv import LightCSV

with open("myfile.csv") as f:
    for row in LightCSV().read(f):
        print(row)

NOTE: Blank lines at any point in the file will be ignored

Parameters

LightCSV can be parametrized during initialization to fine-tune its behaviour.

The following example shows initialization with default parameters:

from lightcsv import LightCSV

myCSV_reader = LightCSV(
    separator=",",
    quote_char='"',
    field_names = None,
    strict=True,
    has_headers=False
)

Available settings:

separator: character used as separator (defaults to ,)
quote_char: character used to quote strings (defaults to ").
This char can be escaped by duplicating it.
field_names: can be any iterable or sequence of str (i.e. a list of strings).
If set, these will be used as column titles (dictionary keys), and also sets the expected number of columns.
strict: Sets whether the parser runs in strict mode or not.
In strict mode the parser will raise a ValueError exception if a cell cannot be decoded or column numbers don't match. In non-strict mode non-recognized cells will be returned as strings. If there are more columns than expected they will be ignored. If there are less, the dictionary will contain also fewer values.
has_headers: whether the first row should be taken as column titles or not.
If set, field_names cannot be specified. If not set, and no field names are specified, dictionary keys will be just the column positions of the cells.

Data types recognized

The parser will try to match the following types are recognized in this order:

None (empty values). Unlike CSV reader, it will return None (null) for empty values.
Empty strings ("") are recognized correctly.
str (strings): Anything that is quoted with the quotechar. Default quotechar is ".
If the string contains a quote, it must be escaped duplicating it. i.e. "HELLO ""WORLD""" decodes to HELLO "WORLD" string.
int (integers): an integer with a preceding optional sign.
float: any float recognized by Python
datetime: a datetime in ISO format (with 'T' or whitespace in the middle), like 2022-02-02 22:02:02
date: a date in ISO format, like 2022-02-02
time: a time in ISO format, like 22:02:02

If all this parsing attempts fails, a string will be returned, unless strict_mode is set to True. In the latter case, a ValueError exception will be raised.

Implementing your own type recognizer

You can implement your own deserialization by subclassing LightCSV and override the method parse_obj().

For example, suppose we want to recognize hexadecimal integers in the format 0xNNN.... We can implement it this way:

import re
from lightcsv import LightCSV

RE_HEXA = re.compile('0[xX][A-Za-z0-9]+$')  # matches 0xNNNN (hexadecimals)


class CSVHexRecognizer(LightCSV):
    def parse_obj(self, lineno: int, chunk: str):
        if RE_HEXA.match(chunk):
            return int(chunk[2:], 16)
        
        return super().parse_obj(lineno, chunk)

As you can see, you have to override parse_obj(). If your match fails, you have to invoke super() (overridden) parse_obj() method and return its result.

Why

Python built-in CSV module is a bit over-engineered for simple tasks, and one normally doesn't need all bells and whistles. With LightCSV you just open a filename and iterate over its rows.

Decoding None for empty cells is needed very often and can be really cumbersome as the standard csv tries hard to cover many corner-cases (if that's your case, this tool might not be suitable for you).

LightCSV - This CSV reader is implemented in just pure Python.

Related tags

Overview

LightCSV

Usage

Parameters

Data types recognized

Implementing your own type recognizer

Why

Owner

Jose Rodriguez

Small-File-Explorer - I coded a small file explorer with several options

Listreqs is a simple requirements.txt generator. It's an alternative to pipreqs

A tool for batch processing large fasta files and accompanying metadata table to upload to repositories via API

Copy only text-like files from the folder

Various converters to convert value sets from CSV to JSON, etc.

This python project contains a class FileProcessor which allows one to grab a file and get some meta data and header information from it

A simple bulk file renamer, written in python.

Pti-file-format - Reverse engineering the Polyend Tracker instrument file format

A wrapper for DVD file structure and ISO files.

Python function to construct a ZIP archive with on the fly - without having to store the entire ZIP in memory or disk

An universal file format tool kit. At present will handle the ico format problem.

Swiss army knife for Apple's .tbd file manipulation

OnedataFS is a PyFilesystem interface to Onedata virtual file system

A Python script to backup your favorite Discord gifs

A tiny Configuration File Parser for Python Projects

A python script to convert an ucompressed Gnucash XML file to a text file for Ledger and hledger.

pytiff is a lightweight library for reading chunks from a tiff file

Python Fstab Generator is a small Python script to write and generate /etc/fstab files based on yaml file on Unix-like systems.

A file utility for accessing both local and remote files through a unified interface.

Dragon Age: Origins toolset to extract/build .erf files, patch language-specific .dlg files, and view the contents of files in the ERF or GFF format