Skip to content

Eureka is a Rest-API project for Web Scraping, data cleaning and organization, based on FastAPI and following a hexagonal architecture. Designed for the Eureka by Turing project of the National University of Colombia

License

julianVelandia/Eureka

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Eureka

Coverage Status License: MIT

Eureka is a Rest-API project for Web Scraping and data cleaning, based on FastAPI and following a hexagonal architecture. Designed for the Eureka by Turing project of the National University of Colombia

Disclaimer: this is a work in progress project, stay tuned for updates (*).

Installation and Usage

Setup environment

You should create a virtual environment and activate it:

python -m venv venv/
source venv/bin/activate

Clone repository

git clone https://github.com/julianVelandia/Eureka.git

And then install the development dependencies:

pip install -r requirements.dev.txt

Run unit tests

You can run all the tests with:

make tests

Alternatively, you can run pytest yourself.

pytest

Run

The project runs like any FastApi application and by default the configuration endpoint works.

uvicorn main:app --reload

Features

  • RenderEngine: Render a web page from its url to select the texts to scrape and save them in a Json file
  • Templates to visualize the scraped information
  • export data in json and csv files
  • Make automated requests from a Json configuration file
  • Unpack Json configuration files

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

This project is licensed under the terms of the MIT license.

About

Eureka is a Rest-API project for Web Scraping, data cleaning and organization, based on FastAPI and following a hexagonal architecture. Designed for the Eureka by Turing project of the National University of Colombia

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published