Repositorio com arquivos processados da CPI da COVID para facilitar analise

Related tags

Miscellaneouscpi4all
Overview

cpi4all

Repositorio com arquivos processados da CPI da COVID para facilitar analise

Organização

No site do senado é possivel encontrar a lista de todos os documentos coletados pela CPI da COVID.

A tabela no site possui a seguinte estrutura:

No Arquivos Data de recebimento Remetente Origem Descrição Caixa Em Resposta
1 Link1 ... ... ... ... ... ...
2 Link2/link3 ... ... ... ... ... ...

Esses links levam ao download de arquivos PDF com os documentos em questão.

Nesse repositorio você podera encontrar a versão txt desses arquivos. O nome do arquivo nesse repositorio é formado por <No do documento>_<numero do link>. Por exemplo:

link1 = 1_1 porque ele é relativo ao arquivo No 1, e é o primeiro link.

link2 = 2_1 porque ele é relativo ao arquivo No 2, e é o primeiro link dessa linha.

link3 = 2_2 porque ele é relativo ao arquivo No 2, e é o segundo link da linha.

A versão texto de todos os documentos está na pasta database/txts/.

Exemplos:

Arquivo No 1, primeiro link: 1_1

Arquivo No 4, quarto link: 3_4

Nota 1: Nem todos os arquivos foram convertidos ainda

Nota 2: A conversão usa reconhecimento de imagem e pode ficar bem ruim as vezes, gerando erros ortograficos ou palavras sem nexo algum.

Para desenvolvedores

Os scripts funcionam na seguinte sequencia:

  1. extract_rows.py: Vai no site do senado e extrai as informações de cada linha da tabela. Todos os dados são salvos em database/rows.
  2. extract_headers.py: Para cada link em cada linha, esse script pega metadados do arquivo (tamanho, tipo) que vão ser uteis depois. Esses dados são salvos em database/headers.
  3. download_pdfs.py: Baixa todos os PDFs descritos em database/headers e salva em database/pdfs.
  4. convert_pdf_to_jpg.py: Converte todos os PDFs em database/pdfs para imagens em database/jpgs.
  5. convert_jpg_to_txt.py: Converte todos as imagens em database/jpgs para texto em database/txt.

Por motivos de performance, apenas as pastas database/rows, database/headers e database/txts sao salvas nesse repositorio.

TODO: 0. Melhorar esse readme :)

  1. Usar o githubpages para gerar um site estatico que permite pesquisar em todos os txt
  2. Terminar de converter todos os arquivos
  3. Investigar arquivos em que a conversão ficou pessima.
  4. Fazer extração automatica de datas e prover um json com a ordem cronologica dos arquivos.
Owner
Breno Rodrigues Guimarães
Breno Rodrigues Guimarães
Open Source Management System for Botanic Garden Collections.

BotGard 3.0 Open Source Management System for Botanic Garden Collections built and maintained by netzkolchose.de in cooperation with the Botanical Gar

netzkolchose.de 1 Dec 15, 2021
Simple Wayland HotKey Daemon

swhkd Simple Wayland HotKey Daemon This project is still very new and I'm making new decisions everyday as to where I should drive this project. I'm u

Aakash Sen Sharma 407 Dec 30, 2022
The most hackable keyboard in all the land

MiRage Modular Keyboard © 2021 Zack Freedman of Voidstar Lab Licensed Creative Commons 4.0 Attribution Noncommercial Share-Alike The MiRage is a 60% o

Zack Freedman 558 Dec 30, 2022
Utility/Raiding selfbot made by Shell and Roover.

Utility/Raiding selfbot made by Shell and Roover. We are open to suggestions and ideas.

Shell 2 Dec 08, 2021
Comprehensive OpenAPI schema generator for Django based on pydantic

🗡️ Djagger Automated OpenAPI documentation generator for Django. Djagger helps you generate a complete and comprehensive API documentation of your Dj

13 Nov 26, 2022
Script to automate the scanning of "old printed photos"

photoscanner Script to automate the scanning of "old printed photos" Just run: ./scan_photos.py The script is prepared to be run by fades. Otherw

Facundo Batista 2 Jan 21, 2022
A simple PID tuner and simulator.

PIDtuner-V0.1 PlantPy PID tuner version 0.1 Features Supports first order and ramp process models. Supports Proportional action on PV or error or a sp

3 Jun 23, 2022
Statistics Calculator module for all types of Stats calculations.

Statistics-Calculator This Calculator user the formulas and methods to find the statistical values listed. Statistics Calculator module for all types

2 May 29, 2022
An open-source Python project series where beginners can contribute and practice coding.

Python Mini Projects A collection of easy Python small projects to help you improve your programming skills. Table Of Contents Aim Of The Project Cont

Leah Nguyen 491 Jan 04, 2023
Python client SDK designed to simplify integrations by automating key generation and certificate enrollment using Venafi machine identity services.

This open source project is community-supported. To report a problem or share an idea, use Issues; and if you have a suggestion for fixing the issue,

Venafi, Inc. 13 Sep 27, 2022
XAC HID Gamepad implementation for CircuitPython 7 or above.

CircuitPython_XAC_Gamepad Setup process Install CircuitPython 7 or above in your board. Add the init.py file under \lib\adafruit_hid directory of CIRC

5 Dec 19, 2022
March-madness - March Madness results 1985-2021

march-madness Results for all 2,268 NCAA Division I Men's Basketball Tournament games since the modern format was introduced in 1985. Includes years,

Darik Harter 2 Feb 26, 2022
AndroidEnv is a Python library that exposes an Android device as a Reinforcement Learning (RL) environment.

AndroidEnv is a Python library that exposes an Android device as a Reinforcement Learning (RL) environment.

DeepMind 814 Dec 26, 2022
Clock in automatically in SCU.

auto_clock_in Clock in automatically in SCU. Features send logs to Telegram bot How to use? pip install -r requirements.txt () edit user_list, token_A

2 Dec 13, 2021
Web app for keeping track of buildings in danger of collapsing in the event of an earthquake

Bulina Roșie 🇷🇴 Un cutremur în București nu este o situație ipotetică. Este o certitudine că acest lucru se va întâmpla. În acest context, la mai bi

Code for Romania 27 Nov 29, 2022
Python library to natively send files to Trash (or Recycle bin) on all platforms.

Send2Trash -- Send files to trash on all platforms Send2Trash is a small package that sends files to the Trash (or Recycle Bin) natively and on all pl

Andrew Senetar 224 Jan 04, 2023
Monitoring of lake dynamics

slamcore_utils Description This repo contains the slamcore-setup-dataset script. It can be used for installing a sample dataset for offline testing an

10 Jun 23, 2022
script buat mengcrack

setan script buat mengcrack cara install $ pkg install upgrade && pkg update $ pkg install python $ pkg install git $ pip install requests $ pip insta

1 Nov 03, 2021
Better firefox bookmarks script for rofi

rofi-bookmarks Small python script to open firefox bookmarks with rofi. Features Icons! Only show bookmarks in a specified bookmark folder Show entire

32 Nov 10, 2022
A Python wrapper for Matrix Synapse admin API

Synapse-admin-api-python A Python wrapper for Matrix Synapse admin API. Versioning This library now supports up to Synapse 1.45.0, any Admin API intro

Knugi 9 Sep 28, 2022