Python programming language Test

Overview

Exercise

You are tasked with creating a data-processing app that pre-processes and enriches the data coming from crawlers, with the following requirements.

  • INPUT: csv-like data submitted by crawlers
  • OUTPUT: clean data saved into mongodb collections
  1. The app is an HTTP API server. Every year, crawlers will submit the data saved in a file, using the API endpoint designed by you.
  2. Examples of data the crawlers will submit every year: see data-2018.txt, data-2019.txt, data-2020.txt. You can't change the format of the data.
  3. As you can see, the data coming from crawlers is not 100% well-structured, the API should parse it correctly.
  4. a repeated submission with the data of the same year should perform an update on the existing yearly data.
  5. if there is any error in the submission or processing, the API should return a proper error message with proper HTTP response status
  6. For each university, enrich the data with a URL and a text description of it using Duckduckgo API e.g. https://api.duckduckgo.com/?q=harvard&format=json&pretty=1
  7. The app inserts or updates clean data in 2 mongodb tables/collections:
  • table 1 - the yearly data table
  • table 2 - the universities info
  1. table 1 contains data from every year, table 2 contains only the latest data.
  2. the data processing and transformations should be covered by tests.
  3. The solution should be in Python programming language, however you may use any 3rd party library you like.

Feel free to clarify the requirements further, if you have any doubts.

Bonus (Optional)

If you have indicated any DevOps skillsets in your resume, please create a Dockerfile, and using docker deploy the web app onto a free cloud platform, such as Heroku.

How the solution is assessed

The criteria are as follows (descending importance)

  1. Your code should perform the functionalities required
  2. Your code should be well-covered by tests
  3. Your code should be modular, readable and maintenable by other engineers.
  4. Your code should be robust, and can handle failure such as missing field, disconnection from DB or external server.
  5. Your code should be efficient and fast.
  6. Your code should be pretty.
Owner
Monirul Islam Khan
Database Engineer | DBA | Data Analyst | Data Scientist | Dev Ops
Monirul Islam Khan
A general-purpose wallet generator, for supported coins only

2gen A general-purpose generator for keys. Designed for all cryptocurrencies supporting the Bitcoin format of keys and addresses. Functions To enable

Vlad Usatii 1 Jan 12, 2022
This wishes a mentioned users on their birthdays

BirthdayWisher Requirements: "mysqlserver", "email id and password", "Mysqlconnector" In-Built Modules: "smtplib", "datetime","imghdr" In Mysql: A tab

vellalaharshith 1 Sep 13, 2022
Scrapper For Paste.pics

PrntScScrapper Scrapper for Paste.pics If you are bored you can find some random screenshots from prnt.sc Features Saving screenshots Open in Browser

Fareusz 1 Dec 29, 2021
This tool helps you to reverse any regex and gives you the opposite/allowed Letters,numerics and symbols.

Regex-Reverser This tool helps you to reverse any regex and gives you the opposite/allowed Letters,numerics and symbols. Screenshots Usage/Examples py

x19 0 Jun 02, 2022
Sodium is a general purpose programming language which is instruction-oriented (a new programming concept that we are developing and devising) [Still developing...]

Sodium Programming Language Sodium is a general purpose programming language which is instruction-oriented (a new programming concept that we are deve

Instruction Oriented Programming 22 Jan 11, 2022
Tomador de ramos UC automatico para Windows, Linux y macOS

auto-ramos v2.0 Tomador de ramos UC automatico para Windows, Linux y macOS Funcion Este script de Python tiene como principal objetivo hacer que la to

Open Source eUC 13 Jun 29, 2022
This repository collects nice scripts ("plugins") for the SimpleBot bot for DeltaChat.

Having fun with DeltaChat This repository collects nice scripts ("plugins") for the SimpleBot bot for DeltaChat. DeltaChat is a nice e-mail based mess

Valentin Brandner 3 Dec 25, 2021
In this project, we'll be creating a virtual personal assistant for ourselves using our favorite programming language

In this project, we'll be creating a virtual personal assistant for ourselves using our favorite programming language, Python. We can perform several offline as well as online operations using the bo

Ashutosh Krishna 188 Jan 03, 2023
decorator

Decorators for Humans The goal of the decorator module is to make it easy to define signature-preserving function decorators and decorator factories.

Michele Simionato 734 Dec 30, 2022
CRC Reverse Engineering Tool in Python

CRC Beagle CRC Beagle is a tool for reverse engineering CRCs. It is designed for commnication protocols where you often have several messages of the s

Colin O'Flynn 51 Jan 05, 2023
A complete python calculator with 2 modes Float and Int numbers.

Python Calculator This program is made for learning purpose. Getting started This Program runs using python, install it via terminal or from thier ofi

Felix Sanchez 1 Jan 18, 2022
A collection of software that serve no purpose other than waste your time. Forking is encouraged!

the-useless-collection A collection of software that serve no purpose other than waste your time. Forking is encouraged! Requires Python 3.9. Usage Go

Imsad2 1 Mar 16, 2022
Demo code for "Logs in distributed systems" webinar

Hexlet Logs Demo Пререквизиты docker-compose python3 Учетка в DataDog Базовое понимание, что такое логи (можно почитать гайд

Anton Markelov 1 Dec 01, 2021
Comprehensive OpenAPI schema generator for Django based on pydantic

🗡️ Djagger Automated OpenAPI documentation generator for Django. Djagger helps you generate a complete and comprehensive API documentation of your Dj

13 Nov 26, 2022
A small Python library which gives you the IEEE-754 representation of a floating point number.

ieee754 ieee754 is small Python library which gives you the IEEE-754 representation of a floating point number. You can specify a precision given in t

Bora Canbula 5 Dec 20, 2022
Get a link to the web version of a git-tracked file or directory

githyperlink Get a link to the web version of a git-tracked file or directory. Applies to GitHub and GitLab remotes (and maybe others but those are no

Tomas Fiers 2 Nov 08, 2022
Abilian Core: an enterprise application development platform based on the Flask micro-framework, the SQLAlchemy ORM

About Abilian Core is an enterprise application development platform based on the Flask micro-framework, the SQLAlchemy ORM, good intentions and best

Abilian open source projects 47 Apr 14, 2022
Tie together `drf-spectacular` and `djangorestframework-dataclasses` for easy-to-use apis and openapi schemas.

Speccify Tie together drf-spectacular and djangorestframework-dataclasses for easy-to-use apis and openapi schemas. Usage @dataclass class MyQ

Lyst 4 Sep 26, 2022
A working roblox account generator it doesnt bypass the capcha stuff cuz these didnt showed up in my test runs

A working roblox account generator (state 11.5.2021) it doesnt bypass the capcha stuff cuz these didnt showed up in my test runs

TerrificTable 22 Jan 03, 2023
A project to explore and provide useful code for Mango Markets

🥭 Mango Explorer A project to explore and provide useful code for Mango Markets

Blockworks Foundation 160 Dec 19, 2022