A simple XLSX/CSV reader - to dictionary converter

Overview

Test Python package codecov Upload Python Package to PyPI PythonVersions Sourcery Black Snyk Downloads Twitter Follow

sheet2dict

A simple XLSX/CSV reader - to dictionary converter

Installing

To install the package from pip, first run:

python3 -m pip install --no-cache-dir sheet2dict

Required pip packages for sheet2doc: csv, openpyxl

Usage

This library has 2 main features: reading a spreadsheet files and converting them to array of python dictionaries.

- XLSX

Use xlsx_to_dict() method when converting form spreadsheets.
Supported file formats for spreadsheets are: .xlsx,.xlsm,.xltx,.xltm
Spreadsheets with multiple worksheets are supported. If no sheet is specified, the active sheet is selected. If there is only one sheet, it is considered active.

# Import the library
from sheet2dict import Worksheet

# Create an object
ws = Worksheet()

# Convert active sheet (without specifying sheet name)
ws.xlsx_to_dict(path='inventory.xlsx')

# Convert the 'Main Warehouse' sheet of the 'inventory.xslx' spreadsheet file.
ws.xlsx_to_dict(path='inventory.xlsx', select_sheet='Main Warehouse')

# object.header returns first row with the data in a spreadsheet 
print(ws.header)

# object.sheet_items returns converted rows as dictionaries in the array 
print(ws.sheet_items)

You can parse data when worksheet is an object

# Import the library
from sheet2dict import Worksheet

# Example: read spreadsheet as object
path = 'inventory.xlsx'
xlsx_file = open(path, 'rb')
xlsx_file = BytesIO(xlsx_file.read())

# Parse spreadsheet from object
ws = Worksheet()
ws.xlsx_to_dict(path=xlsx_file)
print(ws.header)

- CSV

Use csv_to_dict() method when converting form csv.
CSV is a format with many variations, better handle encodings and delimiters on user side and not within module itself.

# Import the library
from sheet2dict import Worksheet

# Create an object
ws = Worksheet()

# Read CSV file
csv_file = open('inventory.csv', 'r', encoding='utf-8-sig')

# Convert 
ws.csv_to_dict(csv_file=csv_file, delimiter=';')

# object.header returns first row with the data in a spreadsheet 
print(ws.header)

# object.sheet_items returns converted rows as dictionaries in the array 
print(ws.sheet_items)

- Other functions

Worksheet object.header returns first row with the data in a spreadsheet

>> ws.header {'country': 'SK', 'city': 'Bratislava', 'citizens': '400000', 'random_field': 'cc'}">
Python 3.9.1
[Clang 12.0.0 (clang-1200.0.32.28)] on darwin
>>> from sheet2dict import Worksheet
>>> ws = Worksheet()
>>> ws.xlsx_to_dict(path="inventory.xlsx")

>>> ws.header
{'country': 'SK', 'city': 'Bratislava', 'citizens': '400000', 'random_field': 'cc'}

Worksheet object.sanitize_sheet_items removes None or empty dictionary keys from sheet_items

>> ws.sheet_items [ {'country': 'CZ', 'city': 'Prague', 'citizens': '600000', None: '22', 'random_field': 'cc'}, {'country': 'UK', 'city': 'London', 'citizens': '2000000', None: '33', 'random_field': 'cc'} ] >>> ws.sanitize_sheet_items [ {'country': 'CZ', 'city': 'Prague', 'citizens': '600000', 'random_field': 'cc'}, {'country': 'UK', 'city': 'London', 'citizens': '2000000', 'random_field': 'cc'} ]">
>>> from sheet2dict import Worksheet
>>> ws = Worksheet()
>>> ws.xlsx_to_dict(path="inventory.xlsx")

>>> ws.sheet_items
[
  {'country': 'CZ', 'city': 'Prague', 'citizens': '600000', None: '22', 'random_field': 'cc'},
  {'country': 'UK', 'city': 'London', 'citizens': '2000000', None: '33', 'random_field': 'cc'}
]

>>> ws.sanitize_sheet_items
[
  {'country': 'CZ', 'city': 'Prague', 'citizens': '600000', 'random_field': 'cc'},
  {'country': 'UK', 'city': 'London', 'citizens': '2000000', 'random_field': 'cc'}
]

Contributing and Code of Conduct

Contributing to sheet2dict

As an open source project, sheet2dict welcomes contributions of many forms.
Please read and follow our Contributing to sheet2dict

Contributors:

  • Thanks to 白一百 (bái-yī-bǎi) for making sheet2dict work with multi-sheet Excel files.

Code of Conduct

As a contributor, you can help us keep the sheet2dict project open and inclusive.
Please read and follow our Code of Conduct

You might also like...
SCTYMN is a GitHub repository that includes some simple scripts(currently only python scripts) that can be useful.

Simple Codes That You Might Need SCTYMN is a GitHub repository that includes some simple scripts(currently only python scripts) that can be useful. In

Żmija is a simple universal code generation tool.

Żmija Żmija is a simple universal code generation tool. It is intended to be used as a means to generate code that is both efficient and easily mainta

 Simple yet powerful CAD (Computer Aided Design) library, written with Python.
Simple yet powerful CAD (Computer Aided Design) library, written with Python.

Py-MADCAD it's time to throw parametric softwares out ! Simple yet powerful CAD (Computer Aided Design) library, written with Python. Installation

A simple USI Shogi Engine written in python using python-shogi.

Revengeshogi My attempt at creating a USI Shogi Engine in python using python-shogi. Current State of Engine Currently only generating random moves us

A simple document management REST based API for collaboratively interacting with documents

documan_api A simple document management REST based API for collaboratively interacting with documents.

Django-Text-to-HTML-converter - The simple Text to HTML Converter using Django framework

Django-Text-to-HTML-converter This is the simple Text to HTML Converter using Dj

Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files

pyexcel - Let you focus on data, instead of file formats Support the project If your company has embedded pyexcel and its components into a revenue ge

A flask extension using pyexcel to read, manipulate and write data in different excel formats: csv, ods, xls, xlsx and xlsm.

Flask-Excel - Let you focus on data, instead of file formats Support the project If your company has embedded pyexcel and its components into a revenu

Scrapes mcc-mnc.com and outputs 3 files with the data (JSON, CSV & XLSX)

mcc-mnc.com-webscraper Scrapes mcc-mnc.com and outputs 3 files with the data (JSON, CSV & XLSX) A Python script for web scraping mcc-mnc.com Link: mcc

LightCSV - This CSV reader is implemented in just pure Python.

LightCSV Simple light CSV reader This CSV reader is implemented in just pure Python. It allows to specify a separator, a quote char and column titles

JSON and CSV data for Swahili dictionary with over 16600+ words

kamusi JSON and CSV data for swahili dictionary with over 16600+ words. This repo consists of data from swahili dictionary with about 16683 words toge

Sheet Data Image/PDF-to-CSV Converter

Sheet Data Image/PDF-to-CSV Converter

DB-Drive-CSV - This is app is can be used to access CSV file as JSON from Google Drive.

DB Drive CSV This is app is can be used to access CSV file as JSON from Google Drive. How To Use Create file/ upload file to Google Drive There's 2 fi

Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.
Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel order of RGB and BGR. Simple Channel Converter for ONNX.

scc4onnx Very simple NCHW and NHWC conversion tool for ONNX. Change to the specified input order for each and every input OP. Also, change the channel

A Python module for creating Excel XLSX files.
A Python module for creating Excel XLSX files.

XlsxWriter XlsxWriter is a Python module for writing files in the Excel 2007+ XLSX file format. XlsxWriter can be used to write text, numbers, formula

A Python module for creating Excel XLSX files.
A Python module for creating Excel XLSX files.

XlsxWriter XlsxWriter is a Python module for writing files in the Excel 2007+ XLSX file format. XlsxWriter can be used to write text, numbers, formula

É uma API feita em Python e Flask que pesquisa informações em uma tabela .xlsx e retorna o resultado.
É uma API feita em Python e Flask que pesquisa informações em uma tabela .xlsx e retorna o resultado.

API de rastreamento de pacotes É uma API feita em Python e Flask que pesquisa informações de rastreamento de pacotes em uma tabela .xlsx e retorna o r

Transfers a image file(.png) to an Excel file(.xlsx)
Transfers a image file(.png) to an Excel file(.xlsx)

Transfers a image file(.png) to an Excel file(.xlsx)

An XLSX spreadsheet renderer for Django REST Framework.

drf-renderer-xlsx provides an XLSX renderer for Django REST Framework. It uses OpenPyXL to create the spreadsheet and returns the data.

Comments
  • Empty cells in xlsx become the non-empty string

    Empty cells in xlsx become the non-empty string "None" in the dictionary

    Thank you for writing this. I have recently moved from using csv files to excel files and this has made the transition easier.

    One issue I have encountered is that empty cells are not being returned as a Falsey value like None, or the empty string ''.

    From what I have been able to understand, openpyxl reads the empty cell as None, and the following code in sheet2dict uses str to convert this into the string 'None'.

    https://github.com/Pytlicek/sheet2dict/blob/93bf731d327d620755ffd5585aedb0b23a17a6a8/sheet2dict/main.py#L30-L34

    I have filtered my own dictionaries to turn 'None' into '' but that will cause issues when any of my sheet really do contain the word None.

    I just did some quick testing, and it seems that the csv.DictReader() in the python standard library uses the empty string '' for missing values in the a csv file, so that might be a good thing to do for these cases.

    enhancement 
    opened by JoshuaCapel 5
  • Code Proposal: Allow User to Specify Sheet (Work with multi-sheet Excel Files)

    Code Proposal: Allow User to Specify Sheet (Work with multi-sheet Excel Files)

    This library is awesome! I'm using it a lot since it makes working with Excel files very simple.

    I have a proposal: Allow the user to specify the sheet they want to modify.

    I write a lot of functional programming and I can't remember if it's possible to overwrite a class function at import, so instead I've modified the sheet2dict to return a specified sheet when openpyxl opens the file.

    in main.py:

    def xlsx_to_dict(self, path, select_sheet=None):
            """
            Read a Worksheet and return it as array of dictionaries
            :param self: Worksheet Object
            :param path: Path to XLSX file
            :return: Array of rows as dictionaries
            """
            book = load_workbook(path)
            if select_sheet == None:
                  sheet = book.active
            else:
                  sheet = book[select_sheet]
    

    The user would then read an Excel file like this:

    ws =Worksheet()
    ws.xlsx_to_dict(path=<insert path>, select_sheet='<sheetname>'
    

    I don't know if this is the most elegant way to select a sheet, but it works for me. I suppose to import all worksheets by default - I think the maximum is 32 - into a larger dictionary, <edit I didn't complete my thought>, that there could be a clever way to put dictionaries in dictionaries, but this might be too convoluted since I assume most users will just use one sheet. Another option is just to continue creating Worksheet() objects with different names and sheets selected.

    enhancement 
    opened by bai-yi-bai 3
  • Add an option to parse all worksheets in one function call

    Add an option to parse all worksheets in one function call

    Added an option to parse all sheets from a workbook at once while keeping the way existing features worked untouched.

    Had to modify sheet_items to a dict with sheet names as keys and sheet's objects as values.

    enhancement 
    opened by 68Kamil68 2
  • Update README.md

    Update README.md

    Changed "the first sheet" to "active sheet" to avoid confusion if someone saves a spreadsheet and the active worksheet is, fe. second in line. Also, what is considered an active worksheet when there is only one worksheet in the spreadsheet. Add an example with and without the "select_sheet" option.

    documentation 
    opened by Pytlicek 1
Releases(0.1.5)
Owner
Tomas Pytel
PyConSK Organizer, Python Developer & DevOps
Tomas Pytel
Visualizacao-dados-dell - Repositório com as atividades desenvolvidas no curso de Visualização de Dados

📚 Descrição Neste curso da Dell trabalhamos com a visualização de dados. 🖥️ Aulas 1.1 - Explorando conjuntos de dados 1.2 - Fundamentos de visualiza

Claudia dos Anjos 1 Dec 28, 2021
ReStructuredText and Sphinx bridge to Doxygen

Breathe Packagers: PGP signing key changes for Breathe = v4.23.0. https://github.com/michaeljones/breathe/issues/591 This is an extension to reStruct

Michael Jones 643 Dec 31, 2022
Gaphor is the simple modeling tool

Gaphor Gaphor is a UML and SysML modeling application written in Python. It is designed to be easy to use, while still being powerful. Gaphor implemen

Gaphor 1.3k Jan 03, 2023
script to calculate total GPA out of 4, based on input gpa.csv

gpa_calculator script to calculate total GPA out of 4 based on input gpa.csv to use, create a total.csv file containing only one integer showing the t

Mohamad Bastin 1 Feb 07, 2022
level2-data-annotation_cv-level2-cv-15 created by GitHub Classroom

[AI Tech 3기 Level2 P Stage] 글자 검출 대회 팀원 소개 김규리_T3016 박정현_T3094 석진혁_T3109 손정균_T3111 이현진_T3174 임종현_T3182 Overview OCR (Optimal Character Recognition) 기술

6 Jun 10, 2022
Types that make coding in Python quick and safe.

Type[T] Types that make coding in Python quick and safe. Type[T] works best with Python 3.6 or later. Prior to 3.6, object types must use comment type

Contains 17 Aug 01, 2022
This repo provides a package to automatically select a random seed based on ancient Chinese Xuanxue

🤞 Random Luck Deep learning is acturally the alchemy. This repo provides a package to automatically select a random seed based on ancient Chinese Xua

Tong Zhu(朱桐) 33 Jan 03, 2023
A simple document management REST based API for collaboratively interacting with documents

documan_api A simple document management REST based API for collaboratively interacting with documents.

Shahid Yousuf 1 Jan 22, 2022
A repository of links with advice related to grad school applications, research, phd etc

A repository of links with advice related to grad school applications, research, phd etc

Shaily Bhatt 946 Dec 30, 2022
Tips for Writing a Research Paper using LaTeX

Tips for Writing a Research Paper using LaTeX

Guanying Chen 727 Dec 26, 2022
The sarge package provides a wrapper for subprocess which provides command pipeline functionality.

Overview The sarge package provides a wrapper for subprocess which provides command pipeline functionality. This package leverages subprocess to provi

Vinay Sajip 14 Dec 18, 2022
A Python Package To Generate Strong Passwords For You in Your Projects.

shPassGenerator Version 1.0.6 Ready To Use Developed by Shervin Badanara (shervinbdndev) on Github Language and technologies used in This Project Work

Shervin 11 Dec 19, 2022
Mkdocs obsidian publish - Publish your obsidian vault through a python script

Mkdocs Obsidian Mkdocs Obsidian is an association between a python script and a

Mara 49 Jan 09, 2023
Bring RGB to life in Neovim

Bring RGB to life in Neovim Change your RGB devices' color depending on Neovim's mode. Fast and asynchronous plugin to live your vim-life to the fulle

Antoine 40 Oct 27, 2022
A tutorial for people to run synthetic data replica's from source healthcare datasets

Synthetic-Data-Replica-for-Healthcare Description What is this? A tailored hands-on tutorial showing how to use Python to create synthetic data replic

11 Mar 22, 2022
A collection of simple python mini projects to enhance your python skills

A collection of simple python mini projects to enhance your python skills

PYTHON WORLD 12.1k Jan 05, 2023
DataRisk Detection Learning Resources

DataRisk Detection Learning Resources Data security: Based on the "data-centric security system" position, it generally refers to the entire security

Liao Wenzhe 59 Dec 05, 2022
JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates.

JTEX JTEX is a command line tool (CLI) for rendering LaTeX documents from jinja-style templates. This package uses Jinja2 as the template engine with

Curvenote 15 Dec 21, 2022
🌱 Complete API wrapper of Seedr.cc

Python API Wrapper of Seedr.cc Table of Contents Installation How I got the API endpoints? Start Guide Getting Token Logging with Username and Passwor

Hemanta Pokharel 43 Dec 26, 2022
Comprehensive Python Cheatsheet

Comprehensive Python Cheatsheet Download text file, Buy PDF, Fork me on GitHub or Check out FAQ. Contents 1. Collections: List, Dictionary, Set, Tuple

Jefferson 1 Jan 23, 2022