This repository provides a set functions to extract paragraphs from AWS Textract responses.

Overview

extract-paragraphs-with-aws-textract

Since AWS Textract (the AWS OCR service) does not have a native function to extract paragraphs, this repository provides a set of Python 3.X functions built on top of the AWS Python SDK (boto3) to extract paragraphs from AWS Textract responses.

PLEASE NOTE THAT:

  1. It is assumed that your client has the neccesary IAM permissions to access the different AWS resources required.
  2. Since AWS Textract analyze PDF files by running asynchronous operations, the current version assumes that you've already created an s3 bucket and that the PDF files are already stored there. If not, please go to the boto3 docs to know how to create a bucket as well as upload files.
  3. The paragraph_constructor is an ad hoc function for my use case. You may have to adapt it based on the space between lines in your data.

UPCOMING FEATURES:

  • Address abstract cases with the paragrpah_constructor function.
  • Export data in different formats.
  • AWS CloudFormation template for a serverless architecture to execute the functions when a new object is uploaded in your S3 bucket.

Please feel free to suggest new features or improvements to the current code. <3

Owner
Juan Anzola
Juan Anzola
A modern, easy to use, feature-rich, and async ready API wrapper for Discord written in Python.

A modern, easy to use, feature-rich, and async ready API wrapper for Discord written in Python. Key Features Modern Pythonic API using async and await

Senpai Development 4 Nov 05, 2021
Userbot untuk memutar video dan lagu di vcg/os

Userbot untuk memutar video dan lagu di vcg/os

FJ_GAMING 2 Nov 13, 2021
A Python script that exports users from one Telegram group to another using one or more concurrent user bots.

ExportTelegramUsers A Python script that exports users from one Telegram group to another using one or more concurrent user bots. Make sure to set all

Fasil Minale 17 Jun 26, 2022
A Python wrapper around the Soundcloud API

soundcloud-python A friendly wrapper around the Soundcloud API. Installation To install soundcloud-python, simply: pip install soundcloud Or if you'r

SoundCloud 83 Dec 12, 2022
A simple but useful Discord Selfbot with essential features, made with discord.py-self.

Discord Selfbot Xyno Discord Selfbot Xyno is a simple but useful selfbot for Discord. It has currently limited useful features but it will be updated

Amit Pathak 7 Apr 24, 2022
Download song lyrics and metadata from Genius.com 🎶🎤

LyricsGenius: a Python client for the Genius.com API lyricsgenius provides a simple interface to the song, artist, and lyrics data stored on Genius.co

John W. Miller 738 Jan 04, 2023
This is a cryptocurrency trading bot that analyses Reddit sentiment and places trades on Binance based on reddit post and comment sentiment. If you like this project please consider donating via brave. Thanks.

This is a cryptocurrency trading bot that analyses Reddit sentiment and places trades on Binance based on reddit post and comment sentiment. The bot f

Andrei 157 Dec 15, 2022
Cleiton Leonel 4 Apr 22, 2022
Prime Mega is a modular bot running on python3 with autobots theme and have a lot features.

PRIME MEGA Prime Mega is a modular bot running on python3 with autobots theme and have a lot features. Easiest Way To Deploy On Heroku This Bot is Cre

『TØNIC』 乂 ₭ILLΣR 45 Dec 15, 2022
Programmeertheorie 2022 - Team Trainspotters - RailNL

Trainspotters Vak: Programmeertheorie 2022 Gekozen case: RailNL Teamnaam: Trainspotters Studenten: Mijntje Meijer, Sam Bijhouwer, Maik Larooij To-do's

Maik Larooij 1 Jan 25, 2022
The Python client library for the Tuneup Technology App.

Tuneup Technology App Python Client Library The Python client library for the Tuneup Technology App. This library allows you to interact with the cust

Tuneup Technology 0 Jun 29, 2022
This Discord bot is to give timely notifications to Students in the Lakehead CS 2021 Guild

Discord-Bot Goal of Project The purpose of this Discord bot is to give timely notifications to Students in the Lakehead CS 2021 Guild. How can I contr

8 Jan 30, 2022
python library to the bitly api

bitly API python library Installation pip install bitly_api Run tests Your username is the lowercase name shown when you login to bitly, your access

Bitly 245 Aug 14, 2022
Osmnx-examples - Usage examples, demos, and tutorials for OSMnx.

OSMnx Examples OSMnx is a Python package to work with street networks and other spatial data from OpenStreetMap: retrieve, model, analyze, and visuali

Geoff Boeing 1.2k Jan 03, 2023
一个基于Python3的Bot。目前支持以Docker的方式部署在vps上。支持Aria2、本子下载、网易云音乐下载、Pixiv榜单下载、Youtue-dl支持、搜图。

介绍 一个基于Python3的Bot。目前支持以Docker的方式部署在vps上。 主要功能: 文件管理 修改主界面为 filebrowser,账号为admin,密码为admin,主界面路径:http://ip:port,请自行修改密码 FolderMagic自带的webdav:路径:http://

Ben 650 Jan 08, 2023
Python library for Seeedstudio Grove devices

grove.py Python library for Seeedstudio Grove Devices on embeded Linux platform, especially good on below platforms: Coral Dev Board (Wiki) NVIDIA Jet

Seeed Studio 123 Dec 17, 2022
Elemeno.ai standard development kit in Python

Overview A set of glue code and utilities to make using elemeno AI platform a smooth experience Free software: Apache Software License 2.0 Installatio

Elemeno AI 3 Dec 14, 2022
Automatically detect changes made to the official Telegram sites.

🕷 Telegram Web Crawler This project is developed to automatically detect changes made to the official Telegram sites. This is necessary for anticipat

Il'ya 115 Dec 31, 2022
Notification Reminder Application For Python

Notification-Reminder-Application No matter how well you set up your to-do list and calendar, you aren’t going to get things done unless you have a re

1 Nov 26, 2021