lightweight, fast and robust columnar dataframe for data analytics with online update

Last update: May 19, 2022

Related tags

Overview

streamdf

Streamdf is a lightweight data frame library built on top of the dictionary of numpy array, developed for Kaggle's time-series code competition.

Key Features

Fast and robust insertion
- The insertion of row can be performed with amortized constant time (much faster than np.append)
- Automatically falls back to the default value when an abnormal value is inserted
Time-travel
- Get the past state of the data as a slice of the original dataframe without copying
Null/empty-safe aggregations
- Provides a set of aggregation methods that can be safely called when an element has nan or is empty.
Columnar layout
- Internal data is stored in a simple columnar format, which is easier to use for analysis than numpy's structured array

Example

import pandas as pd
from streamdf import StreamDf

df = pd.read_csv('test.csv')
sdf = StreamDf.from_pandas(df)

# extend
sdf.extend({
    'x': 1,
    'y': 2
})

assert len(sdf) == len(df) + 1

# access
print(sdf['x'])

# aggregate
sdf.last_value('x')

import numpy as np
from streamdf import StreamDf

sdf = StreamDf.empty({'x': np.int32, 'time': 'datetime64[D]'}, 'time')

sdf.extend({'x': 1, 'time': np.datetime64('2018-01-01')})
sdf.extend({'x': 5, 'time': np.datetime64('2018-02-01')})
sdf.extend({'x': 3, 'time': np.datetime64('2018-02-03')})

assert len(sdf) == 3

# Time travel (zero copy)
sliced = sdf.slice_until(np.datetime64('2018-02-02'))

assert len(sliced) == 2

lightweight, fast and robust columnar dataframe for data analytics with online update

Related tags

Overview

streamdf

Key Features

Example

Owner

Official PyTorch Implementation of paper "NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting", EGSR 2021.

The Classical Language Toolkit

This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

🤗🖼️ HuggingPics: Fine-tune Vision Transformers for anything using images found on the web.

Knowledge Management for Humans using Machine Learning & Tags

Simple text to phones converter for multiple languages

Fast topic modeling platform

Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline

Mesh TensorFlow: Model Parallelism Made Easier

This repository contains examples of Task-Informed Meta-Learning

TLA - Twitter Linguistic Analysis

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents

A complete NLP guideline for enthusiasts

Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

Repositório da disciplina no semestre 2021-2

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding

This repository structures data in title, summary, tags, sentiment given a fragment of a conversation

Auto translate textbox from Japanese to English or Indonesia

chaii - hindi & tamil question answering