Repositorio com arquivos processados da CPI da COVID para facilitar analise

Related tags

Miscellaneouscpi4all
Overview

cpi4all

Repositorio com arquivos processados da CPI da COVID para facilitar analise

Organização

No site do senado é possivel encontrar a lista de todos os documentos coletados pela CPI da COVID.

A tabela no site possui a seguinte estrutura:

No Arquivos Data de recebimento Remetente Origem Descrição Caixa Em Resposta
1 Link1 ... ... ... ... ... ...
2 Link2/link3 ... ... ... ... ... ...

Esses links levam ao download de arquivos PDF com os documentos em questão.

Nesse repositorio você podera encontrar a versão txt desses arquivos. O nome do arquivo nesse repositorio é formado por <No do documento>_<numero do link>. Por exemplo:

link1 = 1_1 porque ele é relativo ao arquivo No 1, e é o primeiro link.

link2 = 2_1 porque ele é relativo ao arquivo No 2, e é o primeiro link dessa linha.

link3 = 2_2 porque ele é relativo ao arquivo No 2, e é o segundo link da linha.

A versão texto de todos os documentos está na pasta database/txts/.

Exemplos:

Arquivo No 1, primeiro link: 1_1

Arquivo No 4, quarto link: 3_4

Nota 1: Nem todos os arquivos foram convertidos ainda

Nota 2: A conversão usa reconhecimento de imagem e pode ficar bem ruim as vezes, gerando erros ortograficos ou palavras sem nexo algum.

Para desenvolvedores

Os scripts funcionam na seguinte sequencia:

  1. extract_rows.py: Vai no site do senado e extrai as informações de cada linha da tabela. Todos os dados são salvos em database/rows.
  2. extract_headers.py: Para cada link em cada linha, esse script pega metadados do arquivo (tamanho, tipo) que vão ser uteis depois. Esses dados são salvos em database/headers.
  3. download_pdfs.py: Baixa todos os PDFs descritos em database/headers e salva em database/pdfs.
  4. convert_pdf_to_jpg.py: Converte todos os PDFs em database/pdfs para imagens em database/jpgs.
  5. convert_jpg_to_txt.py: Converte todos as imagens em database/jpgs para texto em database/txt.

Por motivos de performance, apenas as pastas database/rows, database/headers e database/txts sao salvas nesse repositorio.

TODO: 0. Melhorar esse readme :)

  1. Usar o githubpages para gerar um site estatico que permite pesquisar em todos os txt
  2. Terminar de converter todos os arquivos
  3. Investigar arquivos em que a conversão ficou pessima.
  4. Fazer extração automatica de datas e prover um json com a ordem cronologica dos arquivos.
Owner
Breno Rodrigues Guimarães
Breno Rodrigues Guimarães
Macros in Python: quasiquotes, case classes, LINQ and more!

MacroPy3 1.1.0b2 MacroPy is an implementation of Syntactic Macros in the Python Programming Language. MacroPy provides a mechanism for user-defined fu

Li Haoyi 3.2k Jan 06, 2023
Create rangebased on lists or values of the range itself. Range any type. Can you imagine?

funcao-allrange-for-python3 Create rangebased on lists or values of the range itself. Range any type. Can you imagine? WARNING!!! THIS MODULE DID NOT

farioso-fernando 1 Feb 09, 2022
token vesting escrow with cliff and clawback

Yearn Vesting Escrow A modified version of Curve Vesting Escrow contracts with added functionality: An escrow can have a start_date in the past.

62 Dec 08, 2022
Lightweight Scheduled Blocks Checker for Current Epoch. No cardano-node Required, data is taken from blockfrost.io

ReLeaderLogs For Cardano Stakepool Operators: Lightweight Scheduled Blocks Checker for Current Epoch. No cardano-node Required, data is taken from blo

SNAKE (Cardano Stakepool) 2 Oct 19, 2021
All exercises done during the Python 3 course in the Video Course (World 1, 2 and 3)

Python3-cursoemvideo-exercises - All exercises done during the Python 3 course in the Video Course (World 1, 2 and 3)

Renan Barbosa 3 Jan 17, 2022
RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.

RDFLib RDFLib is a pure Python package for working with RDF. RDFLib contains most things you need to work with RDF, including: parsers and serializers

RDFLib 1.8k Jan 02, 2023
Assignment for python course, BUPT 2021.

pyFuujinrokuDestiny Assignment for python course, BUPT 2021. Notice username and password must be ASCII encoding. If username exists in database, syst

Ellias Kiri Stuart 3 Jun 18, 2021
Remove Sheet Protection from .xlsx files. Easily.

🔓 Excel Sheet Unlocker Remove sheet protection from .xlsx files. How to use Run Run the script/packaged executable from the command line. Universal u

Daniel 3 Nov 16, 2022
This is a simple SV calling package for diploid assemblies.

dipdiff This is a simple SV calling package for diploid assemblies. It uses a modified version of svim-asm. The package includes its own version minim

Mikhail Kolmogorov 11 Jan 05, 2023
The repository for my video "Playing MINECRAFT with a WEBCAM"

This is the official repo for my video "Playing MINECRAFT with a WEBCAM" on YouTube Original video can be found here: https://youtu.be/701TPxL0Skg Red

Rishabh 27 Jun 07, 2022
디텍션 유틸 모음

Object detection utils 유틸모음 설명 링크 convert convert 관련코드 https://github.com/AI-infinyx/ob_utils/tree/main/convert crawl 구글, 네이버, 빙 등 크롤링 관련 https://gith

codetest 41 Jan 22, 2021
Pampy: The Pattern Matching for Python you always dreamed of.

Pampy: Pattern Matching for Python Pampy is pretty small (150 lines), reasonably fast, and often makes your code more readable and hence easier to rea

Claudio Santini 3.5k Dec 30, 2022
Click2call for asterisk with python

Click2call para Asterisk com Python Este projeto disponibiliza uma API construíd

Benedito Marques 1 Jan 17, 2022
SimplePyBLE - Python bindings for SimpleBLE

The ultimate fully-fledged cross-platform Python BLE library, designed for simplicity and ease of use.

Open Bluetooth Toolbox 27 Aug 28, 2022
An almost fully customizable language made in python!

Whython is a project language, the idea of it is that anyone can download and edit the language to make it suitable to what they want.

Julian 47 Nov 05, 2022
contextlib2 is a backport of the standard library's contextlib module to earlier Python versions.

contextlib2 is a backport of the standard library's contextlib module to earlier Python versions. It also sometimes serves as a real world proving gro

Jazzband 35 Dec 23, 2022
Python Excuse Generator

Excuse Generator Python Excuse Generator This project is an excuse generator that provides the user with an excuse as to why they weren't paying atten

Collin Sanders 5 Jul 07, 2022
Eatlocal - This package helps users solve PyBites code challenges on their local machine

eatlocal This package helps the user solve Pybites code challenges locally. Inst

Russell 0 Jul 25, 2022
automate some stuff so I can be more noob

dota automate some stuff so I can be more noob This is a simple project, but one that I've wanted forever! I use pyautogui, time, smtplib and datetime

Aaron Allen 17 Oct 18, 2022
Sublime Text 2/3 style auto completion for ST4

Hippie Autocompletion Sublime Text 2/3 style auto completion for ST4: cycle through words, do not show popup. Simply hit Tab to insert completion, hit

Alexander Schepanovski 20 May 19, 2022