Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB

Last update: Jan 19, 2022

Overview

dio-live-textract2

Repositório de código para o live coding do dia 05/10/2021 sobre extração de dados estruturados e gravação em banco de dados a partir do Amazon Textract.

Serviços utilizados

Amazon Textract
AWS Lambda
Amazon S3
Amazon DynamoDB

Desenvolvimento

Criando um bucket no Amazon S3

S3 Console -> Create bucket -> Bucket name "dio-live-input-data" -> Manter as configurações padrão -> Create bucket

Processando imagens no Amazon Textract

Textract Console -> Select Document -> Analyze Document -> Tables
Download results -> Salvar arquivo .zip

Criando uma tabela no DynamoDB

DynamoDB Console -> Tables -> Create Table -> Partition key "cod" -> Create table

Implementando a função lambda

Lambda Console -> Functions -> Create function
Use a blueprint -> "s3-get-object-python"
Function name "dio-live-csv-to-db"
Execution role -> "Create a new role from AWS policy templates" -> Role name "S3ToDynamoDBRole"
S3 Trigger -> Bucket criado anteriormente
Create function
Substituir o código gerado pelo código da pasta /src deste repositório (Obs: atenção para o nome da tabela, deve ser substituído pelo nome da sua)

Passo adicional: Criando um layer com a biblioteca boto3 do Python

Lambda Console -> Additional Resources -> Layers
Name "boto3_layer" -> Upload a .zip file -> baixe e insira o arquivo .zip contido na pasta /src deste respositório
Compatible architecture "x86_64"
Compatible runtimes "Python3.7" (É necessário ser Python3.7 para ser compatível com a versão do blueprint utilizado)
Create
Na função lambda criada -> Selecione layers no diagrama -> Add layer -> Custom layers "boto3_layer" -> Version 1 -> Add

Configurando permissões no Lambda para o DynamoDB

Lambda Console -> Functions -> Selecione a função criada -> Configuration -> Permission -> Execution Role -> Abrir a role criada no Amazon IAM
No IAM -> Permission -> Add inline policy -> Choose a service "DynamoDB" -> Write "PutItem"
Resources -> Selecionar o Arn da sua tabela -> Selecionar a sua região -> Add -> Review Policy -> Name "LambdaDynamoDBPolicy" -> Create policy

Utilizando a aplicação

No Amazon Textract

Amazon Textract Console -> Select Document -> Choose file -> Buscar o arquivo a ser analisado
Download results

No Amazon S3

Extrair o arquivo table_1.csv do arquivo baixado do Amazon Textract
Acessar o bucket criado anteriormente -> Upload -> Selecionar o arquivo table_1.csv -> Upload

No DynamoDB

Tables -> Acessar a tabela criada -> View Items

Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB

Related tags

Overview

dio-live-textract2

Serviços utilizados

Desenvolvimento

Criando um bucket no Amazon S3

Processando imagens no Amazon Textract

Criando uma tabela no DynamoDB

Implementando a função lambda

Passo adicional: Criando um layer com a biblioteca boto3 do Python

Configurando permissões no Lambda para o DynamoDB

Utilizando a aplicação

No Amazon Textract

No Amazon S3

No DynamoDB

Owner

hugoportela

Text modding tools for FF7R (Final Fantasy VII Remake)

[python3.6] 运用tf实现自然场景文字检测,keras/pytorch实现ctpn+crnn+ctc实现不定长场景文字OCR识别

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

Python Computer Vision application that allows users to draw/erase on the screen using their webcam.

pyntcloud is a Python library for working with 3D point clouds.

How to detect objects in real time by using Jupyter Notebook and Neural Networks , by using Yolo3

Awesome Spectral Indices in Python.

Optical character recognition for Japanese text, with the main focus being Japanese manga

The first open-source library that detects the font of a text in a image.

Run tesseract with the tesserocr bindings with @OCR-D's interfaces

A curated list of resources dedicated to scene text localization and recognition

A Python wrapper for Google Tesseract

The world's simplest facial recognition api for Python and the command line

Image processing in Python

A python script based on opencv and paddleocr, which can automatically pick up tasks, make cookies, and receive rewards in the Destiny 2 Dawning Oven

Thresholding-and-masking-using-OpenCV - Image Thresholding is used for image segmentation

learn how to use Gesture Control to change the volume of a computer

7th place solution

QuanTaichi: A Compiler for Quantized Simulations (SIGGRAPH 2021)

Ddddocr - 通用验证码识别OCR pypi版