An intuitive library to extract features from time series

Overview

Documentation Status license PyPI - Python Version PyPI Downloads Open In Colab

Time Series Feature Extraction Library

Intuitive time series feature extraction

This repository hosts the TSFEL - Time Series Feature Extraction Library python package. TSFEL assists researchers on exploratory feature extraction tasks on time series without requiring significant programming effort.

Users can interact with TSFEL using two methods:

Online

It does not requires installation as it relies on Google Colabs and a user interface provided by Google Sheets

Offline

Advanced users can take full potential of TSFEL by installing as a python package

pip install tsfel

Includes a comprehensive number of features

TSFEL is optimized for time series and automatically extracts over 60 different features on the statistical, temporal and spectral domains.

Functionalities

  • Intuitive, fast deployment and reproducible: interactive UI for feature selection and customization
  • Computational complexity evaluation: estimate the computational effort before extracting features
  • Comprehensive documentation: each feature extraction method has a detailed explanation
  • Unit tested: we provide unit tests for each feature
  • Easily extended: adding new features is easy and we encourage you to contribute with your custom features

Get started

The code below extracts all the available features on an example dataset file.

import tsfel
import pandas as pd

# load dataset
df = pd.read_csv('Dataset.txt')

# Retrieves a pre-defined feature configuration file to extract all available features
cfg = tsfel.get_features_by_domain()

# Extract features
X = tsfel.time_series_features_extractor(cfg, df)

Available features

Statistical domain

Features Computational Cost
ECDF 1
ECDF Percentile 1
ECDF Percentile Count 1
Histogram 1
Interquartile range 1
Kurtosis 1
Max 1
Mean 1
Mean absolute deviation 1
Median 1
Median absolute deviation 1
Min 1
Root mean square 1
Skewness 1
Standard deviation 1
Variance 1

Temporal domain

Features Computational Cost
Absolute energy 1
Area under the curve 1
Autocorrelation 1
Centroid 1
Entropy 1
Mean absolute diff 1
Mean diff 1
Median absolute diff 1
Median diff 1
Negative turning points 1
Peak to peak distance 1
Positive turning points 1
Signal distance 1
Slope 1
Sum absolute diff 1
Total energy 1
Zero crossing rate 1
Neighbourhood peaks 1

Spectral domain

Features Computational Cost
FFT mean coefficient 1
Fundamental frequency 1
Human range energy 2
LPCC 1
MFCC 1
Max power spectrum 1
Maximum frequency 1
Median frequency 1
Power bandwidth 1
Spectral centroid 2
Spectral decrease 1
Spectral distance 1
Spectral entropy 1
Spectral kurtosis 2
Spectral positive turning points 1
Spectral roll-off 1
Spectral roll-on 1
Spectral skewness 2
Spectral slope 1
Spectral spread 2
Spectral variation 1
Wavelet absolute mean 2
Wavelet energy 2
Wavelet standard deviation 2
Wavelet entropy 2
Wavelet variance 2

Citing

When using TSFEL please cite the following publication:

Barandas, Marília and Folgado, Duarte, et al. "TSFEL: Time Series Feature Extraction Library." SoftwareX 11 (2020). https://doi.org/10.1016/j.softx.2020.100456

Acknowledgements

We would like to acknowledge the financial support obtained from the project Total Integrated and Predictive Manufacturing System Platform for Industry 4.0, co-funded by Portugal 2020, framed under the COMPETE 2020 (Operational Programme Competitiveness and Internationalization) and European Regional Development Fund (ERDF) from European Union (EU), with operation code POCI-01-0247-FEDER-038436.

Comments
  • rolling over timeseries

    rolling over timeseries

    Hi folks! Thanks for this fantastic contribution. I'm excited to test the capabilities of this package.

    I have a hard time to extract features constructed by tsfel for a univariate time series rolled by date. For example, I have a pandas dataframe with m dates and n features, and I want to estimate the tsfel feature set given fixed window size. As a result, I should get a dataframe of shape m dates and n times y (number of variables derived from tsfel). Any comments are welcome.

    Thanks in advance!

    good first issue 
    opened by ciberger 12
  • Problem with the number of extracted features samples

    Problem with the number of extracted features samples

    Dear authors, First of all, congratulations on this great project very helpful fo all the community. I have a issue related to the number of extracted features samples: I execute this call X = ts.time_series_features_extractor(cfg, tmp_data, fs = 32, window_size=32, overlap=0, verbose = 0)

    On my accelerometer data frame of dimension 160 x 3, 160 samples and three columns ['X','Y','Z']. From this call, X has a dimension of 1 x 789. It returns a single sample of features for all the 160 x 3 accelerometer samples. However, this does not seem right. Since window _size = 32 (1 second of time frame), it has to return to me an X whit dimension 5 x 789. How is this possible.

    bug 
    opened by FlorencDemrozi 9
  • Feature extractor doesn't run

    Feature extractor doesn't run

    Hi everyone,

    I'm looking to extract features from 3 IMUs each containing a 3-axis Accelerometer and Gyroscopes. I have created a dataframe to combine the data from all of them 3 IMUs x 2 sensors (Acc, Gry) x 3 axis (xyz) = 18 columns + Timestamp

    I started out calling the tsfel.get_features_by_domain on the entire dataframe, but that never progressed from 0% Complete. Then, I much reduce the problem:

    • features.json: I copied the original file in the older 'feature_extraction' changed the 'use' from 'yes' to 'no's so it will only calculate [Min, Max, Mean, Median]
    • data = selected only one columns and 100 rows with float numbers

    ` cfg = tsfel.get_features_by_domain(domain = 'statistical', json_path = 'features.json')

    data = df.loc[s_times[0]:f_times[0], 'Neck.Acc.X'][:101].to_list()

    df_tsfel = tsfel.time_series_features_extractor( # configuration file with features to be extracted dict_features = cfg,
    # dataframe window to calculate features window on signal_windows = data,
    # sampling frequency of original signal fs = 100, # sliding window size window_size = 100 ) `

    It surely can't get any simpler than this and still it doesn't leave the 0% image

    There must be something wrong with the way I set up stuff. Can someone help please? I'm currentely writting a paper and will need to give up using this package if I dont' manage to sort this out...

    Some extra info:

    • Using tsfel==0.1.4
    • Windows 10
    • Sample data attached. ** sample.zip **
    help wanted 
    opened by mmarcato 8
  • Updates to the pip package

    Updates to the pip package

    Hello! thank you for the tsfel package.

    Do you plan to push recently made updates to the PyPI? I see that there have been a number of changes to tsfel since 14th Feb.

    If not, would you recommend that we use the development branch or stick with v0.1.4

    question 
    opened by happypanda5 6
  • X_train reduced to one row after tsfel.time_series_features_extractor(cfg, X_train, fs=fs)

    X_train reduced to one row after tsfel.time_series_features_extractor(cfg, X_train, fs=fs)

    The X_train data reduced from 208 rows to just 1 row, resulting error for further execution of the code. What can go wrong?

    Here is the code:

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
    print(X_train)
    
    cfg = tsfel.get_features_by_domain()
    # Get features
    X_train = tsfel.time_series_features_extractor(cfg, X_train, fs=fs)
    X_test = tsfel.time_series_features_extractor(cfg, X_test, fs=fs)
    print(X_train)
    print(X_test)
    
    corr_features = tsfel.correlated_features(X_train)
    X_train.drop(corr_features, axis=1, inplace=True)
    X_test.drop(corr_features, axis=1, inplace=True)
    
    help wanted 
    opened by renzha-miun 6
  • UnicodeEncodeError: 'charmap' codec can't encode character '\u2588' in position 12: character maps to <undefined>

    UnicodeEncodeError: 'charmap' codec can't encode character '\u2588' in position 12: character maps to

    I'm using TSFEL on a Windows 10 machine and end up with the following error message whenever I enable the progress bar for feature extraction:

    UnicodeEncodeError: 'charmap' codec can't encode character '\u2588' in position 12: character maps to <undefined>

    I'm not really sure why this problem occurs but this Stackoverflow thread provides some suggestions on how to mitigate the issue.

    If I set the verbose parameter to 0, everything works as expected.

    opened by Huii 6
  • A priori  'feature vector size'

    A priori 'feature vector size'

    Dear all,

    Playing with your tool I wanted to obtain the spectral features for a given signal

    cfg = tsfel.get_features_by_domain(domain='spectral')
    len(cfg['spectral'].keys())
    26
    

    26 spectral features, nice!

    But, when I calculated those features

    #Fs previously obtained from data
    X = tsfel.time_series_features_extractor(cfg, data,fs=Fs)
    X.size
    335
    

    335 elements!.

    I would love to iterate among several time-series, and obtain a feature matrix. I would like to know a priori the size of the X features, as I may create a specific variable array to store the values. I know that certain features of the signal are computed in time slots (such as FFT_mean_coeff), but is really time consuming to annotate how many results per feature I should expect.

    Thus, is there any option to know a priori how many elements will be in the X series?

    enhancement 
    opened by jpalma-espinosa 5
  • [Feature]Verbose option

    [Feature]Verbose option

    Dear all, Thank you for your incredible work.

    I am giving a try with your software, analyzing some neural recordings that I have. Because of computational power, I am running my code through Google's Colaboratory. However, since this tool has a fixed time for running, it should really help a verbose option when running

    tsfel.time_series_features_extractor(cfg,data)
    

    as it may help to calculate the amount of time that certain feature calculation should take

    opened by jpalma-espinosa 5
  • How to extract certain feats from a domain

    How to extract certain feats from a domain

    Hi, thanks for this tool! It's a huge help. I'm struggling with how I go about extracting some but not all features from the spectral domain.

    To extract all we use something like this: cfg_file = tsfel.get_features_by_domain('spectral') data = tsfel.time_series_features_extractor(cfg_file, data, fs=fs)

    Which function can we use to extract a list of chosen features? Thanks!

    opened by saydeking 4
  • multiple timeseries feature extraction

    multiple timeseries feature extraction

    Hi,

    I am using tsfel for 564 timeseries analysis, I want to extract the features of each time course and get a dataframe containing the features for all time courses (they should have the same column and each row represents a specific time course)

    So I used a loop for this, my dataset has nan in some time course.

    My code looks like this, but it only shows the feature extraction started, never finished and could not return the features. image Any suggestions on this?

    Many thanks!

    opened by IrenXu 4
  • Detailed handbook for users or GUI?

    Detailed handbook for users or GUI?

    Hi,

    I am wondering if there is a gui version of tsfel. Also, what should the input data look like? Is this able to compute multiple timeseries at the same time and extract their features and cluster them based on the features?

    question 
    opened by IrenXu 4
  • Some questions for the module

    Some questions for the module

    • The module will generate a different number of features depending on the length of the time series. But I got the different lengths of time series, how to make them all have the same number of the feature so I can merge them?
    • Does it have some group by id feature? so I can merge all the ts data into one file and process them together?
    • I saw in the example notebook, we can share a google link to [email protected] then we can extract features from that shared file name, but it gives an error.
    opened by b-y-f 0
  • AttributeError: module 'scipy.stats' has no attribute 'median_absolute_deviation'

    AttributeError: module 'scipy.stats' has no attribute 'median_absolute_deviation'

    Hi,

    I have installed tsfel in an empty virtual environment inside Dockerfile:

    # install python and packages
    RUN apt-get -y install python3 python3-venv python3-pip
    RUN python3 -m venv /home/rstudio/venv
    RUN /home/rstudio/venv/bin/pip3 install --upgrade pip setuptools wheel
    RUN /home/rstudio/venv/bin/pip3 install tsfel
    

    When I want to extract all features, I get an error:

    ! AttributeError: module 'scipy.stats' has no attribute 'median_absolute_deviation'
    
    opened by MislavSag 4
  • How to determine if the extracted features are correct

    How to determine if the extracted features are correct

    When I extract the MFCC and LPCC feature matrix, how can I tell that there is no problem with my extraction? Is it possible to reverse the reconstruction to calculate their relative errors?

    question 
    opened by Akai-ai 1
  • Request for Optional Batch ID Grouping Support

    Request for Optional Batch ID Grouping Support

    First of all, great package - many thanks for your contributions!

    One request is for the ability to specify a "batch" identifier column within a data frame and group by that column value as well as persisting those column values on the output feature matrix (similar to tsfresh which allows for an optional column_id parameter to be provided by the user).

    As an example, after loading the UCIHAR dataset to a flattened/stacked dataframe:

    batchID time x_acc y_acc z_acc 000001 1 X Y Z 000001 2 X Y Z 000001 3 X Y Z ... 000001 128 X Y Z
    000002 1 X Y Z
    ... 000003 1 X Y Z
    ...

    Currently what is returned from the following is a dataframe without any reference to the input grouping:

    import tsfel
    cfg_file = tsfel.get_features_by_domain()
    df_train_tsfel = df_train.set_index(['batchid', 'time'])
    headers = df_train_tsfel.columns
    tsfel_features = tsfel.time_series_features_extractor(cfg_file, df_train_tsfel, fs=50, window_size=250, header_names=headers)
    

    Alternatively, could also just implicitly pass and utilize a pre-existing df index, if present. Apologies if this is already possible and I just missed it within the documentation. Ultimately the desired output would look like the following:

    batchID x_acc_feature1 x_acc_feature2 ... z_acc_featureN 000001 ... 000002 ... 000003 ... ...

    enhancement 
    opened by nadolsw 4
  • Fix df.append deprecation warning

    Fix df.append deprecation warning

    opened by atick-faisal 1
Releases(v0.2.0-ops)
  • v0.2.0-ops(Jun 2, 2022)

  • v0.1.4(Feb 14, 2021)

    Version 0.1.4

    • Bugfixes

      • Fixed a bug on the progress bar not being displayed if the signal is passed already divided into windows #49
      • Fixed a bug on the distance feature #54
      • Fixed a bug raising zero division in the ECDF slope feature #57
      • Fixed a bug when adding customised features using the JSON
      • Fixed a bug on LPC was returning inconsistent values #58
      • Fixed a bug on normalised autocorrelation #64
    • Improvements

      • Refactoring of some code sections and overall improved stability
      • The documentation has been improved and a FAQ section was created
      • The window_splitter parameter is now deprecated. If the user selected a window_size it is assumed that the signal must be divided into windows.
      • Unit tests improvements
    • New features

      • Added to return the size of the feature vector from the configuration dictionary #50
    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Jan 21, 2020)

  • v0.1.1(Jan 21, 2020)

    • Added new features

      • Empirical cumulative distribution function
      • Empirical cumulative distribution function percentile
      • Empirical cumulative distribution function slope
      • Empirical cumulative distribution function percentile count
      • Spectral entropy
      • Wavelet entropy
      • Wavelet absolute mean
      • Wavelet standard deviation
      • Wavelet variance
      • Wavelet energy
    • Minor fixes for Google Colab

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1-dev(Jan 21, 2020)

  • 0.1.0(Dec 3, 2019)

Owner
Associação Fraunhofer Portugal Research
Associação Fraunhofer Portugal Research
Associação Fraunhofer Portugal Research
Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity

[ICLR 2022] Deep Ensembling with No Overhead for either Training or Testing: The All-Round Blessings of Dynamic Sparsity by Shiwei Liu, Tianlong Chen, Zahra Atashgahi, Xiaohan Chen, Ghada Sokar, Elen

VITA 18 Dec 31, 2022
Pairwise Learning for Neural Link Prediction for OGB (PLNLP-OGB)

Pairwise Learning for Neural Link Prediction for OGB (PLNLP-OGB) This repository provides evaluation codes of PLNLP for OGB link property prediction t

Zhitao WANG 31 Oct 10, 2022
PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Masked Autoencoders: A PyTorch Implementation This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners: @

Meta Research 4.8k Jan 04, 2023
Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018

Receptive Field Block Net for Accurate and Fast Object Detection By Songtao Liu, Di Huang, Yunhong Wang Updatas (2021/07/23): YOLOX is here!, stronger

Liu Songtao 1.4k Dec 21, 2022
Contains source code for the winning solution of the xView3 challenge

Winning Solution for xView3 Challenge This repository contains source code and pretrained models for my (Eugene Khvedchenya) solution to xView 3 Chall

Eugene Khvedchenya 51 Dec 30, 2022
Wider-Yolo Kütüphanesi ile Yüz Tespit Uygulamanı Yap

WIDER-YOLO : Yüz Tespit Uygulaması Yap Wider-Yolo Kütüphanesinin Kullanımı 1. Wider Face Veri Setini İndir Train Dataset Val Dataset Test Dataset Not:

Kadir Nar 6 Aug 22, 2022
[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.

LBYL-Net This repo implements paper Look Before You Leap: Learning Landmark Features For One-Stage Visual Grounding CVPR 2021. Getting Started Prerequ

SVIP Lab 45 Dec 12, 2022
[CVPR2021 Oral] FFB6D: A Full Flow Bidirectional Fusion Network for 6D Pose Estimation.

FFB6D This is the official source code for the CVPR2021 Oral work, FFB6D: A Full Flow Biderectional Fusion Network for 6D Pose Estimation. (Arxiv) Tab

Yisheng (Ethan) He 201 Dec 28, 2022
Unsupervised Pre-training for Person Re-identification (LUPerson)

LUPerson Unsupervised Pre-training for Person Re-identification (LUPerson). The repository is for our CVPR2021 paper Unsupervised Pre-training for Per

143 Dec 24, 2022
A tool to analyze leveraged liquidity mining and find optimal option combination for hedging.

LP-Option-Hedging Description A Python program to analyze leveraged liquidity farming/mining and find the optimal option combination for hedging imper

Aureliano 18 Dec 19, 2022
Distributing reference energies for SMIRNOFF implementations

Warning: This code is currently experimental and under active development. Is it not yet suitable for distribution or use as reference implementation.

Open Force Field Initiative 1 Dec 07, 2021
ExCon: Explanation-driven Supervised Contrastive Learning

ExCon: Explanation-driven Supervised Contrastive Learning Link to the paper: https://arxiv.org/pdf/2111.14271.pdf Contributors of this repo: Zhibo Zha

Zhibo (Darren) Zhang 18 Nov 01, 2022
Denoising Diffusion Probabilistic Models

Denoising Diffusion Probabilistic Models Jonathan Ho, Ajay Jain, Pieter Abbeel Paper: https://arxiv.org/abs/2006.11239 Website: https://hojonathanho.g

Jonathan Ho 1.5k Jan 08, 2023
This repository contains Prior-RObust Bayesian Optimization (PROBO) as introduced in our paper "Accounting for Gaussian Process Imprecision in Bayesian Optimization"

Prior-RObust Bayesian Optimization (PROBO) Introduction, TOC This repository contains Prior-RObust Bayesian Optimization (PROBO) as introduced in our

Julian Rodemann 2 Mar 19, 2022
A novel framework to automatically learn high-quality scanning of non-planar, complex anisotropic appearance.

appearance-scanner About This repository is an implementation of the neural network proposed in Free-form Scanning of Non-planar Appearance with Neura

Xiaohe Ma 14 Oct 18, 2022
Optimized code based on M2 for faster image captioning training

Transformer Captioning This repository contains the code for Transformer-based image captioning. Based on meshed-memory-transformer, we further optimi

lyricpoem 16 Dec 16, 2022
Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection

Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection abstract:Unlike 2D object detection where all RoI featur

DK. Zhang 2 Oct 07, 2022
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

WebDataset WebDataset is a PyTorch Dataset (IterableDataset) implementation providing efficient access to datasets stored in POSIX tar archives and us

1.1k Jan 08, 2023
Binary classification for arrythmia detection with ECG datasets.

HEART DISEASE AI DATATHON 2021 [Eng] / [Kor] #English This is an AI diagnosis modeling contest that uses the heart disease echocardiography and electr

HY_Kim 3 Jul 14, 2022
An Inverse Kinematics library aiming performance and modularity

IKPy Demo Live demos of what IKPy can do (click on the image below to see the video): Also, a presentation of IKPy: Presentation. Features With IKPy,

Pierre Manceron 481 Jan 02, 2023