Proyecto - Análisis de texto de eventos históricos

Overview

Acceder al código desde Google Colab para poder ver de manera adecuada todas las visualizaciones y poder interactuar con ellas.

Link de acceso: https://colab.research.google.com/drive/1XqDm6szrNG8ZdH37EVITPCSw7BDZFFQ5?usp=sharing

Corto video explicativo: https://youtu.be/ZDPXc56jOj4

Proyecto Big Data - Análisis de texto de eventos históricos

Declaración del conjunto de datos

Contamos con un dataset en formato JSON proveniente del repositorio 'awesome-json-datasets' en la sección 'Historical Events' sobre eventos históricos (disponible en: https://github.com/jdorfman/awesome-json-datasets). Este dataset cuenta con información desde el año 299 A.C. hasta el año 2013. Recopila sucesos importantes en el mundo a lo largo de este periodo señalado.

La estrucutra de cada recopilación es la siguiente:

{
    "date": "fecha del acontecimiento",
    "description": "descripción del evento en cuestión",
    "lang": "lenguaje de la descripción",
    "category1": "catergoría interna del dataset",
    "granularity": "granularidad"
}

Como se puede ver, no cuenta con una estructura compleja, y sus campos más importantes son 'date' que nos indica la fecha del suceso y 'description' donde se encuentran todos los detalles del evento. Este dataset cuenta con 20.330 registros diferentes.

Planteamiento de la problemática y diseño de la solución (tecnologías a implementar)

Se plantea realizar un análisis descriptivo de esta información a nivel de país, agrupando sus eventos históricos y ver qué palabras son recurrentes en estos eventos. Así nos podemos dar una rápida percepción de la historia de un país en concreto. También se plantea analizar palabras clave en los eventos históricos como lo son 'guerra', 'atentados', 'ataque', 'muertos', 'descubrimiento', 'invención' y ver que tan concurrentes son a lo largo de la historia.

Para esta labor, nos apoyaremos de la herramienta MongoDB en su entorno de Python Pymongo. Este sistema de base de datos NoSQL nos ayudará a manejar adecuadamente el formato de este dataset (JSON) y más importante aún con el tratamiento de textos. Para esto último nos apoyaremos en dos funcionalidades de MongoDB: En el uso de expresiones regulares para busqueda en campos de texto y en las operaciones Map-Reduce. Junto con MongoDB, nos apoyaremos en las librerías propias de analítica de datos de Python. Con esto se pretenderá alcanzar los objetivos de este proyecto.

Arcpy Tool developed for ArcMap 10.x that checks DVOF points against TDS data and creates an output feature class as well as a check database.

DVOF_check_tool Arcpy Tool developed for ArcMap 10.x that checks DVOF points against TDS data and creates an output feature class as well as a check d

3 Apr 18, 2022
Badge-Link-Creater 'For more beautiful profiles.'

Badge-Link-Creater 'For more beautiful profiles.' Ready Badges Prepares the codes of the previously prepared badges for you. Note Click here for more

Mücahit Gündüz 9 Oct 19, 2022
In this repo i inherit the pos module and added QR code to pos receipt

odoo-pos-inherit In this repo i inherit the pos module and added QR code to pos receipt 1- Create new Odoo Module using command line $ python odoo-bin

5 Apr 09, 2022
Create standalone, installable R Shiny apps using Electron

WARNING This is still very much a work in progress and nothing can be assumed stable in any way Temp notes: Two types of created installer, based on w

Chase Clark 5 Dec 24, 2021
Shopping-card - Shopping Card Project With Python

Shopping Card Project this application was built to handle problems with saving

moein98 1 May 06, 2022
Translation patch for Hololive ERROR

Translation patch for Hololive ERROR How do I install the patch? Grab the Translation.zip file for the latest version from the releases page, and unzi

18 Jul 20, 2022
An account generator for guilded.gg that I made a while back and decided to bring back up

An account generator for guilded.gg that I made a while back and decided to bring back up

8 Nov 17, 2022
A submodule of rmcrkd/ODE-Uniqueness

Heston-ODE This repo contains the Heston-related code that accompanies the article One-sided maximal uniqueness for a class of spatially irregular ord

0 Jan 05, 2022
Sudoku-Solver

Sudoku-Solver This is a personal project, that put all my today knowledges to the test, is a project that im developing alone with a lot of effort and

Carlos Ismael Gitto Bernales 5 Nov 08, 2021
Beancount: Double-Entry Accounting from Text Files.

beancount: Double-Entry Accounting from Text Files Contents Description Documentation Download & Installation Versions Filing Bugs Copyright and Licen

2.3k Dec 28, 2022
A performant state estimator for power system

A state estimator for power system. Turbocharged with sparse matrix support, JIT, SIMD and improved ordering.

9 Dec 12, 2022
An html wrapper for python

MessySoup What is it? MessySoup is a python wrapper for html elements. While still a ways away, the main goal is to be able to build a wesbite straigh

4 Jan 05, 2022
The fastest way to copy to (not from) high speed flash storage.

FastestCopy The fastest way to copy to (not from) high speed flash storage. This is about 3-6x faster than file copy on explorer.exe to usb flash driv

Derek Frombach 0 Nov 03, 2021
使用clash核心,对服务器进行Netflix解锁批量测试。

注意事项 测速及解锁测试仅供参考,不代表实际使用情况,由于网络情况变化、Netflix封锁及ip更换,测速具有时效性 本项目使用 Python 编写,使用前请完成环境安装 首次运行前请安装pip及相关依赖,也可使用 pip install -r requirements.txt 命令自行安装 Net

11 Dec 07, 2022
Really bad lisp implementation. Fun with pattern matching.

Lisp-py This is a horrible, ugly interpreter for a trivial lisp. Don't use it. It was written as an excuse to mess around with the new pattern matchin

Erik Derohanian 1 Nov 23, 2021
Check a discord message and give it a percentage of scamminess

scamChecker Check a discord message and give it a percentage of scamminess Run the bot, and run the command !scamCheck and it will return a percentage

3 Sep 22, 2022
APRS Track Direct is a collection of tools that can be used to run an APRS website

APRS Track Direct APRS Track Direct is a collection of tools that can be used to run an APRS website. You can use data from APRS-IS, CWOP-IS, OGN, HUB

Per Qvarforth 42 Dec 29, 2022
This repository is an archive of emails that are sent by the awesome Quincy Larson every week.

Awesome Quincy Larson Email Archive This repository is an archive of emails that are sent by the awesome Quincy Larson every week. If you fi

Sourabh Joshi 912 Jan 05, 2023
Zeus is an open source flight intellingence tool which supports more than 13,000+ airlines and 250+ countries.

Zeus Zeus is an open source flight intellingence tool which supports more than 13,000+ airlines and 250+ countries. Any flight worldwide, at your fing

DeVickey 1 Oct 22, 2021
An application for automation of the mining function in the game Alienworlds.IO

alienautomation A Python script made to automate the tidious job of mining on AlienWorlds This script: Automatically opens the browser Automatically l

anonieXdev 42 Dec 03, 2022