SDL: Synthetic Document Layout dataset

Last update: Oct 07, 2021

Overview

SDL: Synthetic Document Layout dataset

SDL is the project that synthesizes document images. It facilitates multiple-level labeling on document images and can generate in multiple languages.

Sample image

Structure of data

Quick start

python flexible_layout.py --config_file configs/page.yaml

Instruction to run data generation

Go to instruction

Visualization of the result

python data_manipulation/visualize.py

Vietnamese 300000 images link:

Release soon

Paper

https://arxiv.org/abs/2106.15117

Owner

Sơn Nguyễn

Self-taught programmer Completed courses: CS50, MIT 6.006 Preferred language: Python

GitHub Repository

基于pytorch_rnn的古诗词生成

pytorch_peot_rnn 基于pytorch_rnn的古诗词生成说明 config.py里面含有训练、测试、预测的参数，更改后运行： python main.py 预测结果 if config.do_predict: result = trainer.generate('丽日照残春')

3 May 26, 2022

A Fast Command Analyser based on Dict and Pydantic

Alconna Alconna 隶属于ArcletProject，在Cesloi内有内置 Alconna 是 Cesloi-CommandAnalysis 的高级版，支持解析消息链一般情况下请当作简易的消息链解析器/命令解析器文档暂时的文档 Example from arclet.alcon

19 Jan 03, 2023

[EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction

LM-Critic: Language Models for Unsupervised Grammatical Error Correction This repo provides the source code & data of our paper: LM-Critic: Language M

98 Nov 24, 2022

Search with BERT vectors in Solr and Elasticsearch

123 Dec 29, 2022

Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)

Linear Multihead Attention (Linformer) PyTorch Implementation of reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer:

58 Dec 23, 2022

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

⚠️ Checkout develop branch to see what is coming in pyannote.audio 2.0: a much smaller and cleaner codebase Python-first API (the good old pyannote-au

2.2k Jan 09, 2023

Simple program that translates the name of files into English

Simple program that translates the name of files into English. Useful for when editing/inspecting programs that were developed in a foreign language.

0 Dec 22, 2021

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

41 Jan 03, 2023

Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.

Deep-Learning-for-Text-Document-Classification Text classification is one of the popular tasks in NLP that allows a program to classify free-text docu

2 Mar 17, 2022

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

335 Jan 04, 2023

Différents programmes créant une interface graphique a l'aide de Tkinter pour simplifier la vie des étudiants.

GP211-Grand-Projet Ce repertoire contient tout les programmes nécessaires au bon fonctionnement de notre projet-logiciel. Cette interface graphique es

1 Dec 21, 2021

Community and sentiment analysis based on tweets

The project has set itself the goal of analyzing the thoughts and interaction of Italian users through the social posts expressed through the Twitter platform on the day of the entry into force of th

3 Nov 17, 2022

ElasticBERT: A pre-trained model with multi-exit transformer architecture.

This repository contains finetuning code and checkpoints for ElasticBERT. Towards Efficient NLP: A Standard Evaluation and A Strong Baseli

48 Dec 14, 2022

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

GPT Neo 🎉 1T or bust my dudes 🎉 An implementation of model & data parallel GPT3-like models using the mesh-tensorflow library. If you're just here t

6.7k Dec 28, 2022

Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

Predicting Yelp Review Quality Table of Contents Introduction Motivation Goal and Central Questions The Data Data Storage and ETL EDA Data Pipeline Da

3 Nov 27, 2022

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

POS-Tagger This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.

1 Dec 09, 2021

SDL: Synthetic Document Layout dataset

Related tags

Overview

SDL: Synthetic Document Layout dataset

Sample image

Structure of data

Quick start

Instruction to run data generation

Visualization of the result

Vietnamese 300000 images link:

Paper

Owner

Sơn Nguyễn

基于pytorch_rnn的古诗词生成

A Fast Command Analyser based on Dict and Pydantic

[EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction

Search with BERT vectors in Solr and Elasticsearch

Reproducing the Linear Multihead Attention introduced in Linformer paper (Linformer: Self-Attention with Linear Complexity)

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Simple program that translates the name of files into English

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Text classification is one of the popular tasks in NLP that allows a program to classify free-text documents based on pre-defined classes.

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

Différents programmes créant une interface graphique a l'aide de Tkinter pour simplifier la vie des étudiants.

Community and sentiment analysis based on tweets

ElasticBERT: A pre-trained model with multi-exit transformer architecture.

An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

Predicting the usefulness of reviews given the review text and metadata surrounding the reviews.

This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

An open-source NLP research library, built on PyTorch.

I can help you convert your images to pdf file.

Label data using HuggingFace's transformers and automatically get a prediction service

Finetune gpt-2 in google colab