An application that maps an image of a LaTeX math equation to LaTeX code.

Overview

Image to LaTeX

Code style: black pre-commit License

An application that maps an image of a LaTeX math equation to LaTeX code.

Image to Latex streamlit app

Introduction

The problem of image-to-markup generation has been attempted by Deng et al. (2016). They provide the raw and preprocessed versions of im2latex-100K, a dataset consisting of about 100K LaTeX math equation images. Using their dataset, I trained a model that uses ResNet-18 as encoder (up to layer3) and a Transformer as decoder with cross-entropy loss.

Initially, I used the preprocessed dataset to train my model, but the preprocessing turned out to be a huge limitation. Although the model can achieve a reasonable performance on the test set, it performs poorly if the image quality, padding, or font size is different from the images in the dataset. This phenomenon has also been observed by others who have attempted the same problem using the same dataset (e.g., this project, this issue and this issue). This is most likely due to the rigid preprocessing for the dataset (e.g. heavy downsampling).

To this end, I used the raw dataset and included image augmentation (e.g. random scaling, small rotation) in my data processing pipeline to increase the diversity of the samples. Moreover, unlike Deng et al. (2016), I did not group images by size. Rather, I sampled them uniformly and padded them to the size of the largest image in the batch, to increase the generalizability of the model.

Additional problems that I found in the dataset:

  • Some latex code produces visually identical outputs (e.g. \left( and \right) look the same as ( and )), so I normalized them.
  • Some latex code is used to add space (e.g. \vspace{2px} and \hspace{0.3mm}). However, the length of the space is diffcult to judge. Also, I don't want the model generates code on blank images, so I removed them.

The best run has a character error rate (CER) of 0.17 in test set. Most errors seem to come from unnecessary horizontal spacing, e.g., \;, \, and \qquad. (I only removed \vspace and \hspace during preprocessing. I did not know that LaTeX has so many horizontal spacing commands.)

Possible improvements include:

  • Do a better job cleaning the data (e.g., removing spacing commands)
  • Train the model for more epochs (for the sake of time, I only trained the model for 15 epochs, but the validation loss is still going down)
  • Use beam search (I only implemented greedy search)
  • Use a larger model (e.g., use ResNet-34 instead of ResNet-18)
  • Do some hyperparameter tuning

I didn't do any of these, because I had limited computational resources (I was using Google Colab).

How To Use

Setup

Clone the repository to your computer and position your command line inside the repository folder:

git clone https://github.com/kingyiusuen/image-to-latex.git
cd image-to-latex

Then, create a virtual environment named venv and install required packages:

make venv
make install-dev

Data Preprocessing

Run the following command to download the im2latex-100k dataset and do all the preprocessing. (The image cropping step may take over an hour.)

python scripts/prepare_data.py

Model Training and Experiment Tracking

Model Training

An example command to start a training session:

python scripts/run_experiment.py trainer.gpus=1 data.batch_size=32

Configurations can be modified in conf/config.yaml or in command line. See Hydra's documentation to learn more.

Experiment Tracking using Weights & Biases

The best model checkpoint will be uploaded to Weights & Biases (W&B) automatically (you will be asked to register or login to W&B before the training starts). Here is an example command to download a trained model checkpoint from W&B:

python scripts/download_checkpoint.py RUN_PATH

Replace RUN_PATH with the path of your run. The run path should be in the format of //. To find the run path for a particular experiment run, go to the Overview tab in the dashboard.

For example, you can use the following command to download my best run

python scripts/download_checkpoint.py kingyiusuen/image-to-latex/1w1abmg1

The checkpoint will be downloaded to a folder named artifacts under the project directory.

Testing and Continuous Integration

The following tools are used to lint the codebase:

isort: Sorts and formats import statements in Python scripts.

black: A code formatter that adheres to PEP8.

flake8: A code linter that reports stylistic problems in Python scripts.

mypy: Performs static type checking in Python scripts.

Use the following command to run all the checkers and formatters:

make lint

See pyproject.toml and setup.cfg at the root directory for their configurations.

Similar checks are done automatically by the pre-commit framework when a commit is made. Check out .pre-commit-config.yaml for the configurations.

Deployment

An API is created to make predictions using the trained model. Use the following command to get the server up and running:

make api

You can explore the API via the generated documentation at http://0.0.0.0:8000/docs.

To run the Streamlit app, create a new terminal window and use the following command:

make streamlit

The app should be opened in your browser automatically. You can also open it by visiting http://localhost:8501. For the app to work, you need to download the artifacts of an experiment run (see above) and have the API up and running.

To create a Docker image for the API:

make docker

Acknowledgement

Pixel art as well as various sets for hand crafting

Pixel art as well as various sets for hand crafting

1 Nov 09, 2021
A simple python script to reveal the contents of a proof of vaccination QR code.

vaxidecoder A simple python script to reveal the contents of a proof of vaccination QR code. It takes a QR code image as input, and returns JSon data.

Hafidh 2 Feb 28, 2022
Transfers a image file(.png) to an Excel file(.xlsx)

Transfers a image file(.png) to an Excel file(.xlsx)

Junu Kwon 7 Feb 11, 2022
Fill holes in binary 2D & 3D images fast.

Fill holes in binary 2D & 3D images fast.

11 Dec 09, 2022
Pyconvert is a python script that you can use to convert image files to another image format! (eg. PNG to ICO)

Pyconvert is a python script that you can use to convert image files to another image format! (eg. PNG to ICO)

1 Jan 16, 2022
A simple plugin to view APR images in napari

napari-apr-viewer A simple plugin to view APR images in napari This napari plugin was generated with Cookiecutter using @napari's cookiecutter-napari-

5 Jan 24, 2022
A small Python module for BMP image processing.

micropython-microbmp A small Python module for BMP image processing. It supports BMP image of 1/2/4/8/24-bit colour depth. Loading supports compressio

Quan Lin 4 Nov 02, 2022
A Python3 library to generate dynamic SVGs

The Python library for generating dynamic SVGs using Python3

1 Dec 23, 2021
Fast Image Retrieval is an open source image retrieval framework

Fast Image Retrieval is an open source image retrieval framework release by Center of Image and Signal Processing Lab (CISiP Lab), Universiti Malaya. This framework implements most of the major binar

CISiP Lab 39 Nov 25, 2022
A scalable implementation of WobblyStitcher for 3D microscopy images

WobblyStitcher Introduction A scalable implementation of WobblyStitcher Dependencies $ python -m pip install numpy scikit-image Visualization ImageJ

CSE Lab, ETH Zurich 7 Jul 25, 2022
A simple programming language for manipulating images.

f-stop A simple programming language for manipulating images. Examples OPEN "image.png" AS image RESIZE image (300, 300) SAVE image "out.jpg" CLOSE im

F-Stop 6 Oct 27, 2022
DrawBot is a powerful, free application for macOS that invites you to write Python scripts to generate two-dimensional graphics

DrawBot is a powerful, free application for macOS that invites you to write Python scripts to generate two-dimensional graphics.

Frederik Berlaen 344 Jan 06, 2023
Program designed to mass edit and watermark all photos in a directory

Photographer-All-In-One This is a program designed for photographers to mass edit or watermark photos (.jpg || .png) You can run this program from any

Brad Martin 2 Nov 23, 2021
Labelme is a graphical image annotation tool, It is written in Python and uses Qt for its graphical interface

Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).

Kentaro Wada 9.6k Jan 09, 2023
Python Image Morpher (PIM) is a program that can take two images and blend them to whatever extent or precision that you like

Python Image Morpher (PIM) is a program that can take two images and blend them to whatever extent or precision that you like! It is designed to emulate some of Python's OpenCV image processing from

David Dowd 108 Dec 19, 2022
Image enhancing model for making a blurred image to be somehow clearer than before

This is a very small prject which helps in enhancing the images by taking a Input images. This project has many features like detcting the faces and enhaning the faces itself and also a feature which

3 Dec 03, 2021
Sombra is simple Raytracer written in pure Python.

Sombra Sombra is simple Raytracer written in pure Python. It's main purpose is to help understand how raytracing works with a clean code. If you are l

Hernaldo Jesus Henriquez Nuñez 10 Jul 16, 2022
A warping based image translation model focusing on upper body synthesis.

Pose2Img Upper body image synthesis from skeleton(Keypoints). Sub module in the ICCV-2021 paper "Speech Drives Templates: Co-Speech Gesture Synthesis

zhiyh 15 Nov 10, 2022
A python based library to help you create unique generative images based on Rarity for your next NFT Project

Generative-NFT Generate Unique Images based on Rarity A python based library to help you create unique generative images based on Rarity for your next

Kartikay Bhutani 8 Sep 21, 2022
Tool made for the FWA Yearbook Team to resize multiple images quickly.

ImageResize Tool Tool made for the FWA Yearbook Team to resize multiple images quickly. Make sure to check this repo for future updates How to Use The

LGobin 1 Jan 07, 2022