Feature engineering library that helps you keep track of feature dependencies, documentation and schema

Overview

featureclass

Feature engineering library that helps you keep track of feature dependencies, documentation and schema

This library helps define a featureclass.
featureclass is inspired by dataclass, and is meant to provide alternative way to define features engineering classes.

I have noticed that the below code is pretty common when doing feature engineering:

from statistics import variance
from math import sqrt
class MyFeatures:
    def calc_all(self, datapoint):
        out = {}
        out['var'] = self.calc_var(datapoint),
        out['stdev'] = self.calc_std(out['var'])
        return out
        
    def calc_var(self, data) -> float:
        return variance(data)

    def calc_stdev(self, var) -> float:
        return sqrt(var)

Some things were missing for me from this type of implementation:

  1. Implicit dependencies between features
  2. No simple schema
  3. No documentation for features
  4. Duplicate declaration of the same feature - once as a function and one as a dict key

This is why I created this library.
I turned the above code into this:

float: """Calc stdev""" return sqrt(self.var) print(feature_names(MyFeatures)) # ('var', 'stdev') print(feature_annotations(MyFeatures)) # {'var': float, 'stdev': float} print(asDict(MyFeatures([1,2,3,4,5]))) # {'var': 2.5, 'stdev': 1.5811388300841898} print(asDataclass(MyFeatures([1,2,3,4,5]))) # MyFeatures(stdev=1.5811388300841898, var=2.5) ">
from featureclass import feature, featureclass, feature_names, feature_annotations, asDict, asDataclass
from statistics import variance
from math import sqrt

@featureclass
class MyFeatures:
    def __init__(self, datapoint):
        self.datapoint = datapoint
    
    @feature()
    def var(self) -> float:
        """Calc variance"""
        return variance(self.datapoint)

    @feature()
    def stdev(self) -> float:
        """Calc stdev"""
        return sqrt(self.var)

print(feature_names(MyFeatures)) # ('var', 'stdev')
print(feature_annotations(MyFeatures)) # {'var': float, 'stdev': float}
print(asDict(MyFeatures([1,2,3,4,5]))) # {'var': 2.5, 'stdev': 1.5811388300841898}
print(asDataclass(MyFeatures([1,2,3,4,5]))) # MyFeatures(stdev=1.5811388300841898, var=2.5)

The feature decorator is using cached_property to cache the feature calculation,
making sure that each feature is calculated once per datapoint

You might also like...
ChainJacking is a tool to find which of your Go lang direct GitHub dependencies is susceptible to ChainJacking attack.
ChainJacking is a tool to find which of your Go lang direct GitHub dependencies is susceptible to ChainJacking attack.

ChainJacking is a tool to find which of your Go lang direct GitHub dependencies is susceptible to ChainJacking attack.

Bazel rules to install Python dependencies with Poetry

rules_python_poetry Bazel rules to install Python dependencies from a Poetry project. Works with native Python rules for Bazel. Getting started Add th

An assistant to guess your pip dependencies from your code, without using a requirements file.

Pip Sala Bim is an assistant to guess your pip dependencies from your code, without using a requirements file. Pip Sala Bim will tell you which packag

Repls goes to sleep due to inactivity, but to keep it awake, simply host a webserver and ping it.
Repls goes to sleep due to inactivity, but to keep it awake, simply host a webserver and ping it.

Repls goes to sleep due to inactivity, but to keep it awake, simply host a webserver and ping it. This repo will help you make a webserver with a bit of console controls.

A Puzzle A Day Keep the Work Away

A Puzzle A Day Keep the Work Away No moyu again!

Keep your company's passwords behind the firewall

TeamVault TeamVault is an open-source web-based shared password manager for behind-the-firewall installation. It requires Python 3.3+ and Postgres (wi

A 100% python file organizer. Keep your computer always organized!

PythonOrganizer A 100% python file organizer. Keep your computer always organized! To run the project, just clone the folder and run the installation

School helper, helps you at your pyllabus's.
School helper, helps you at your pyllabus's.

pyllabus, helps you at your syllabus's... WARNING: It won't run without config.py! You should add config.py yourself, it will include your APIKEY. e.g

Ssma is a tool that helps you collect your badges in a satr platform
Ssma is a tool that helps you collect your badges in a satr platform

satr-statistics-maker ssma is a tool that helps you collect your badges in a satr platform 🎖️ Requirements python = 3.7 Installation first clone the

Releases(0.3.0)
  • 0.3.0(Jan 19, 2022)

    What's Changed

      • rename asDict to as_dict and asDataclass to as_dataclass by @Itayazolay in https://github.com/Itayazolay/featureclass/pull/4

    Full Changelog: https://github.com/Itayazolay/featureclass/compare/0.2.1...0.3.0

    Source code(tar.gz)
    Source code(zip)
  • 0.2.1(Jan 11, 2022)

    What's Changed

    • docs, formatting and tests by @Itayazolay in https://github.com/Itayazolay/featureclass/pull/1
    • fine-tune mypy type ignore by @Itayazolay in https://github.com/Itayazolay/featureclass/pull/3

    New Contributors

    • @Itayazolay made their first contribution in https://github.com/Itayazolay/featureclass/pull/1

    Full Changelog: https://github.com/Itayazolay/featureclass/compare/0.2.0...0.2.1

    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(Jan 9, 2022)

An early stage integration of Hotwire Turbo with Django

Note: This is not ready for production. APIs likely to change dramatically. Please drop by our Slack channel to discuss!

Hotwire for Django 352 Jan 06, 2023
Simple dependency injection framework for Python

A simple, strictly typed dependency injection library.

BentoML 14 Jun 29, 2022
Port of the OpenCascade library to JavaScript / WebAssembly using Emscripten

OpenCascade.js A port of the OpenCascade CAD library to JavaScript and WebAssembly via Emscripten. Explore the docs » Examples · Issues · Discuss Proj

Sebastian Alff 347 Jan 08, 2023
Reference python implementation of Chia pool operations for pool operators

This repository provides a sample server written in python, which is meant to server as a basis for a Chia Pool. While this is a fully functional implementation, it requires some work in scalability

Chia Network 451 Dec 13, 2022
KeyBrowser: A program launches a browser and a keylogger at the same time, is used to retrieve a person's personal information

KeyBrowser: A program launches a browser and a keylogger at the same time, is used to retrieve a person's personal information

3 Oct 16, 2022
A python mathematics module

A python mathematics module

Fayas Noushad 4 Nov 28, 2021
RestMapper takes the pain out of integrating with RESTful APIs.

python-restmapper RestMapper takes the pain out of integrating with RESTful APIs. It removes all of the complexity with writing API-specific code, and

Lionheart Software 8 Oct 31, 2020
A collection of modern themes for Tkinter TTK

ttkbootstrap A collection of modern flat themes inspired by Bootstrap. Also includes TTK Creator which allows you to easily create and use your own th

Israel Dryer 827 Jan 04, 2023
Easy installer for running Amazon AVS Device SDK on Raspberry Pi

avs-device-sdk-pi Scripts to enable Alexa voice activation using Picovoice Porcupine If you like the work, find it useful and if you would like to get

4 Nov 14, 2022
Interactive class notebooks for ECE4076 Computer Vision, weeks 1 - 6

ECE4076 Interactive class notebooks for ECE4076 Computer Vision, weeks 1 - 6. ECE4076 is a computer vision unit at Monash University, covering both cl

Michael Burke 9 Jun 16, 2022
A code to clean and extract a bib file based on keywords.

These are two scripts I use to generate clean bib files. clean_bibfile.py: Removes superfluous fields (which are not included in fields_to_keep.json)

Antoine Allard 4 May 16, 2022
Get the stats of a (or more) Hypixel player(s)

Hypixel_Stats Get the statistics of a (or more) Hypixel player(s) Who needs this? Everyone who plays a lot of Minecraft and often plays on mc.hypixel.

Finnomator 1 Feb 12, 2022
Runtime fault injection platform by Daniele Rizzieri (2021)

GDBitflip [v1.04] Runtime fault injection platform by Daniele Rizzieri (2021) This platform executes N times a binary and during each execution it inj

Daniele Rizzieri 1 Dec 07, 2021
A python library with various gambling and gaming classes

gamble is a simple library that implements a collection of some common gambling-related classes Features die, dice, d-notation cards, decks, hands pok

Jacobi Petrucciani 16 May 24, 2022
Custom Weapons 3 attribute support for Custom Weapons X

CW3toX Allows use of Custom Weapons 3 attributes in Custom Weapons X. Requiremen

2 Mar 01, 2022
Minitel 5 somewhat reverse-engineered

Minitel 5 The Minitel was a french dumb terminal with an embedded modem which had its Golden Age before the rise of Internet. Typically cubic, with an

cLx 10 Dec 28, 2022
[x]it! support for working with todo and check list files in Sublime Text

[x]it! for Sublime Text This Sublime Package provides syntax-highlighting, shortcuts, and auto-completions for [x]it! files. Features Syntax highlight

Jan Heuermann 18 Sep 19, 2022
API moment - LussovAPI

LussovAPI TL;DR: py API container, pip install -r requirements.txt, example, main configuration Long version: Install Dependancies Download file requi

William Pedersen 1 Nov 30, 2021
Rename and categorize your DMOJ solutions

DMOJ Downloader What is this for? DMOJ lets you download the code for all your solutions, however the files are just named as numbers

Evan Wild 1 Dec 04, 2022
Flames Calculater App used to calculate flames status between two names created using python's Flask web framework.

Flames Finder Web App Flames Calculater App used to calculate flames status between two names created using python's Flask web framework. First, App g

Siva Prakash 4 Jan 02, 2022