Pipelines de datos, 2021.

Last update: May 19, 2022

Related tags

Overview

Este repo ilustra un proceso sencillo de automatización de transformación y modelado de datos, a través de un pipeline utilizando Luigi.

Stack principal

Python 3.7+
Streamlit
Scikit-learn
Pandas
Luigi

Idea

El proceso completo es descrito en una app interactiva que encuentras en el script app.py. Checa los detalles de cómo levantar la app en la sección de cómo ejecutar los scripts.

Setup

Crea un entorno virtual (te recomiendo usar conda):
```
conda create --name data-pipes python=3.7
```
Activate the virtual environment:
```
conda activate data-pipes
```
Install requirements:
```
pip install -r requirements.txt
```

Ejecuta los scripts

App interactiva

Para ejecutar la app interactiva, simplemente ejecuta el comando de Streamlit con el entorno virtual activado:

(data-pipes) streamlit run app.py

Esto abrirá un servidor local en: http://localhost:8501.

Pipeline de datos

Si deseas ejecutar una tarea en específico ,supongamos la TareaX que se encuentra en el script tareas.py, entonces ejecuta el comando:

PYTHONPATH=. luigi --module tareas TareaX --local-scheduler

¡Puedes extender el código y agregar las tareas que tú desees!

Pipelines de datos, 2021.

Related tags

Overview

Stack principal

Idea

Setup

Ejecuta los scripts

App interactiva

Pipeline de datos

Owner

Rodolfo Ferro

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

Code examples for my Write Better Python Code series on YouTube.

Korean Sentence Embedding Repository

👄 The most accurate natural language detection library for Python, suitable for long and short text alike

This repository describes our reproducible framework for assessing self-supervised representation learning from speech

A calibre plugin that generates Word Wise and X-Ray files then sends them to Kindle. Supports KFX, AZW3 and MOBI eBooks. X-Ray supports 18 languages.

Multilingual word vectors in 78 languages

A simple Flask site that allows users to create, update, and delete posts in a database, as well as perform basic NLP tasks on the posts.

Maha is a text processing library specially developed to deal with Arabic text.

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

Simple python code to fix your combo list by removing any text after a separator or removing duplicate combos

Semantic search for quotes.

Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

AudioCLIP Extending CLIP to Image, Text and Audio

The entmax mapping and its loss, a family of sparse softmax alternatives.

Turn clang-tidy warnings and fixes to comments in your pull request

A repo for materials relating to the tutorial of CS-332 NLP

SimBERT升级版（SimBERTv2）！

Speech Recognition Database Management with python