An open source python library for automated feature engineering

Last update: Jan 05, 2023

Overview

"One of the holy grails of machine learning is to automate more and more of the feature engineering process." ― Pedro Domingos, A Few Useful Things to Know about Machine Learning

Featuretools is a python library for automated feature engineering. See the documentation for more information.

Installation

Install with pip

python -m pip install featuretools

or from the Conda-forge channel on conda:

conda install -c conda-forge featuretools

Add-ons

You can install add-ons individually or all at once by running

python -m pip install featuretools[complete]

Update checker - Receive automatic notifications of new Featuretools releases

python -m pip install featuretools[update_checker]

TSFresh Primitives - Use 60+ primitives from tsfresh within Featuretools

python -m pip install featuretools[tsfresh]

Example

Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.

>> import featuretools as ft
>> es = ft.demo.load_mock_customer(return_entityset=True)
>> es.plot()

Featuretools can automatically create a single table of features for any "target entity"

>> feature_matrix, features_defs = ft.dfs(entityset=es, target_entity="customers")
>> feature_matrix.head(5)

            zip_code  COUNT(transactions)  COUNT(sessions)  SUM(transactions.amount) MODE(sessions.device)  MIN(transactions.amount)  MAX(transactions.amount)  YEAR(join_date)  SKEW(transactions.amount)  DAY(join_date)                   ...                     SUM(sessions.MIN(transactions.amount))  MAX(sessions.SKEW(transactions.amount))  MAX(sessions.MIN(transactions.amount))  SUM(sessions.MEAN(transactions.amount))  STD(sessions.SUM(transactions.amount))  STD(sessions.MEAN(transactions.amount))  SKEW(sessions.MEAN(transactions.amount))  STD(sessions.MAX(transactions.amount))  NUM_UNIQUE(sessions.DAY(session_start))  MIN(sessions.SKEW(transactions.amount))
customer_id                                                                                                                                                                                                                                  ...
1              60091                  131               10                  10236.77               desktop                      5.60                    149.95             2008                   0.070041               1                   ...                                                     169.77                                 0.610052                                   41.95                               791.976505                              175.939423                                 9.299023                                 -0.377150                                5.857976                                        1                                -0.395358
2              02139                  122                8                   9118.81                mobile                      5.81                    149.15             2008                   0.028647              20                   ...                                                     114.85                                 0.492531                                   42.96                               596.243506                              230.333502                                10.925037                                  0.962350                                7.420480                                        1                                -0.470007
3              02139                   78                5                   5758.24               desktop                      6.78                    147.73             2008                   0.070814              10                   ...                                                      64.98                                 0.645728                                   21.77                               369.770121                              471.048551                                 9.819148                                 -0.244976                               12.537259                                        1                                -0.630425
4              60091                  111                8                   8205.28               desktop                      5.73                    149.56             2008                   0.087986              30                   ...                                                      83.53                                 0.516262                                   17.27                               584.673126                              322.883448                                13.065436                                 -0.548969                               12.738488                                        1                                -0.497169
5              02139                   58                4                   4571.37                tablet                      5.91                    148.17             2008                   0.085883              19                   ...                                                      73.09                                 0.830112                                   27.46                               313.448942                              198.522508                                 8.950528                                  0.098885                                5.599228                                        1                                -0.396571

[5 rows x 69 columns]

We now have a feature vector for each customer that can be used for machine learning. See the documentation on Deep Feature Synthesis for more examples.

Featuretools contains many different types of built-in primitives for creating features. If the primitive you need is not included, Featuretools also allows you to define your own custom primitives.

Demos

Predict Next Purchase

Repository | Notebook

In this demonstration, we use a multi-table dataset of 3 million online grocery orders from Instacart to predict what a customer will buy next. We show how to generate features with automated feature engineering and build an accurate machine learning pipeline using Featuretools, which can be reused for multiple prediction problems. For more advanced users, we show how to scale that pipeline to a large dataset using Dask.

For more examples of how to use Featuretools, check out our demos page.

Testing & Development

The Featuretools community welcomes pull requests. Instructions for testing and development are available here.

Support

The Featuretools community is happy to provide support to users of Featuretools. Project support can be found in four places depending on the type of question:

For usage questions, use Stack Overflow with the featuretools tag.
For bugs, issues, or feature requests start a Github issue.
For discussion regarding development on the core library, use Slack.
For everything else, the core developers can be reached by email at [email protected].

Citing Featuretools

If you use Featuretools, please consider citing the following paper:

James Max Kanter, Kalyan Veeramachaneni. Deep feature synthesis: Towards automating data science endeavors. IEEE DSAA 2015.

BibTeX entry:

@inproceedings{kanter2015deep,
  author    = {James Max Kanter and Kalyan Veeramachaneni},
  title     = {Deep feature synthesis: Towards automating data science endeavors},
  booktitle = {2015 {IEEE} International Conference on Data Science and Advanced Analytics, DSAA 2015, Paris, France, October 19-21, 2015},
  pages     = {1--10},
  year      = {2015},
  organization={IEEE}
}

Built at Alteryx Innovation Labs

Comments

Spark Example for Featuretools

Bug/Feature Request Description

In notebooks such as here: https://github.com/Featuretools/predict-next-purchase/blob/master/Tutorial.ipynb and documentation: https://docs.featuretools.com/usage_tips/scaling.html

It mentions the ability to scale to Spark. Could an example be provided like it was for dask here: https://github.com/Featuretools/predict-next-purchase?

Issues created here on Github are for bugs or feature requests. For usage questions and questions about errors, please ask on Stack Overflow with the featuretools tag. Check the documentation for further guidance on where to ask your question.

opened by charliec443 26
Refactor LatLong and Datetime Primitives into Separate Files
Pull Request Description

Fixes #1855

Changes: I decided to split all classes containing Lat/Long functions into their own file as well as classes containing date/time into their own file. In each file I also organized classes in alphabetical order. I don't believe there are any conflicts with the new files as I was able to run the tests.

Comments: Whenever someone is able to review my changes I would also appreciate some input/advice on the testing. I am running them as described on Ubuntu. They run to the end but I do have some failed tests, not sure if this is due to my changes or if it is just part of the process.

As an aside I apologize for all of the unnecessary commits. I'm still getting the hang of it and understand now I may have gone overboard. Also, I accidentally deleted my original branch which is why I am submitting a second pull request.
opened by jacobboney 21
“IndexError: Too many levels” when running Featuretools dfs after upgrade

Featuretools' dfs() method fails to run on my entity set after upgrading from v0.1.21 to v0.2.x and v0.3.0.

The error is raised when the Pandas backend tries to calculate the aggregate features _calculate_agg_features(). In particular:

--> 442 to_merge.reset_index(1, drop=True, inplace=True) ... IndexError: Too many levels: Index has only 1 level, not 2

This is working fine in v0.1.x and the entity set hasn't changed after the upgrade. The entity set is composed of 7 entities and 6 relationships. Each entity (dataframe) is added via entity_from_dataframe.

opened by jrkinley-zz 20

Memory crashing when using featuretools/dask

I'm not sure what I'm doing wrong, but basically I'm taking a fairly large dataframe(11GB) and converting it to dask before running featuretools on it. During DFS my system is running out of memory, which is strange to me because I thought it should be writing to disk?

from dask.distributed import Client, progress

client = Client(n_workers=2, threads_per_worker=2, memory_limit='2GB')
client

import featuretools as ft
import dask.dataframe as dd
dt = {}
dt.update(dict.fromkeys(catgoricalValues, ft.variable_types.Categorical))
dt.update(dict.fromkeys(NumericColumns, ft.variable_types.Numeric))
dask_df = dd.from_pandas(Main[NumericColumns + catgoricalValues], npartitions=50000)
dask_df  # this works

# Make an entityset and add the entity
es = ft.EntitySet(id = 'Test')
es = es.entity_from_dataframe(entity_id="dask_entity", dataframe=dask_df, make_index = True, index="index", variable_types=dt)

# primatives to use
default_agg_primitives =  ["sum", "std", "max", "min", "mean", "count", "percent_true", "num_unique"]
default_trans_primitives =  ["add_numeric", 'multiply_numeric']

feature_matrix, feature_defs = ft.dfs(entityset = es, target_entity = 'dask_entity',
                                       trans_primitives = default_trans_primitives,
                                       agg_primitives=default_agg_primitives, 
                                        max_depth = 2, features_only=False, verbose = True)

My session crashes at this point from using all the memory. I followed various tutorials but I'm not sure what I'm doing wrong? My goal is after DFS is done, I would save the results to a file that I can then pass on to TF/Keras.

opened by gautambak 16

How is `DIFF` calculated?

I read docs but can't understand how does DIFF calculate its value.

This part of my example:

I generated this dataframe using dfs(..., time_window=None)

(time in index is meaning cutoff_time)

What I can't understand is, DIFF(MAX(sales.amount)) will be calculated by applying DIFF on MAX(sales.amount) but since MAX(sales.amount) is an aggregated value, which would be only one single value(=max value before cutoff time), how does DIFF calculate its value? I think that DIFF requires at least 2 values to calculate?...

If I missed something, please let me know how is first value of DIFF(MAX(sales.amount)), 25714.287, calculated..

Thanks

opened by rightx2 16
Calculating direct features use default value if parent missing

Pull Request Description

(replace this text with your description)

After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request.

opened by seriallazer 15
Support/approach for sliding window/multiple snapshots in time

Hi there! (first of all huge thx for dfs, vision & tools, superb work)

My question, the predict_next_purchase sample uses a single cut_off time right? But doesnt that remove a lot of data that could help with the purchase prediction? and we're only using a single day for reference right?

only this data/users -> "Using users who had acivity during training_window days before the cutoff_time, we look to see if they purchase the product in the prediction_window."

I would like to use all data in a single final ml table for the models. Is there support to have the cut off being a sliding window (ex: for each customer) of features from last x days, predicting purchase (yes/no) up x days in the future. So each customer would appear multiple times, depending on choosen sliding window.

Think it's a tipical pattern in predicting future events (predictive maintenance, churn, healthcare). Usually applies to any kind of event prediction. (ex: for every user, machine, predict probability of event E for the next x days for a specific point in time, obv the training the dataset has proper timestamps so that we can "recalculate" feature values for user/machine up to at any point in time)

The dataset becomes non IID obv, some cautions apply.

Makes sense? What's the approach to use DFS with these scenarios? thx!

opened by rquintino 15
LatLong type
The issue in testing comes from mock_ds.py where the mock retail entityset is made with es.entity_from_csv(entity, (line 292). This makes the latlong type in that entityset a string rather than a tuple. The options as I understand them are:

Modify Latitude and Longitude to check if the latlong is a string

Modify entity_from_csv to convert certain strings to tuples

Change the test to do the pandas _from_csv, modify the dataframe and then make entity_from_dataframe.

Leave Latitude and Longitude with no real tests for now.

My gut is to go with 3. Do you have a preference @kmax12?
opened by Seth-Rothschild 15
Bug with parallel feature matrix calculation within sklearn cross-validation

Bug with parallel feature matrix calculation within sklearn cross-validation

Bug Description

Hello, guys! Thank you for the quick release of featuretools 1.1.0 !

During my research I have faced the following bug: I have an estimator which is actually an imblearn Pipeline. The estimator consists of several steps including my custom transformer which calculates feature matrix with featuretools. And I want to check the quality of the model with sklearn cross_validate function . If I set n_jobs > 1 both in featuretools.calculate_feature_matrix and in sklearn.cross_validate, then I get an unexpected error ValueError: cannot find context for 'loky'. When either one of n_jobs is set to 1, then everything works fine.

I googled for some time and I understood that such error might happen when parallelization is used without if __name__ == '__main__' - but it's the best information I've got - nothing more valuable. So for me it looks like there is some conflict in parallelization usage in sklearn and featuretools. And as far both of the libraries are essential as well as parallelization working with big data, i really hope you will be able to find a way to fix it :)

P.S this problem was actual before 1.0.0 release - previously I used 0.24.0 and still faced it

Output of featuretools.show_info()

Featuretools version: 1.1.0

SYSTEM INFO

python: 3.7.5.final.0 python-bits: 64 OS: Darwin OS-release: 19.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: ru_RU.UTF-8 LOCALE: ru_RU.UTF-8

INSTALLED VERSIONS

numpy: 1.21.1 pandas: 1.3.2 tqdm: 4.62.2 PyYAML: 5.4.1 cloudpickle: 1.6.0 dask: 2021.10.0 distributed: 2021.10.0 psutil: 5.8.0 pip: 19.2.3 setuptools: 41.2.0

opened by VGODIE 14
Add include_cutoff_time arg to control whether data at cutoff times a…

Add include_cutoff_time arg to control whether data at cutoff times are included in feature calculations and prevent traininig_window overlapping

Pull Request Description

There was a data overlapping problem when calculating the feature matrix: The data at cutoff time might be used both in calculating features and in calculating target values(#918 ). This could cause data cheating and affect the result as well. There was a trial to solve the issue (#930 ), but It still didn't solve the cheating problem. So, we decided to parameterize it to control whether data at cutoff times are included in feature calculations or not(#942 ) and this PR solves it.

opened by rightx2 14
Fixed #297 update tests to check error strings
On windows platform, there is an open issue currently in pandas where it raises an error when reading a file with accents in the file path (i.e. régions.csv). So, I resolved it with the following:

# featuretools\tests\testing_utils\mock_ds.py:334 df = pd.read_csv(open(filenames[entity], 'r', encoding='utf8'), encoding='utf-8')

This snippet np.dtype((np.integer, np.floating)).type was causing this issue. So, I resolved it by changing it to the following:

np.issubdtype(time, np.integer) or np.issubdtype(time, np.floating)

Not sure how to get the error text for test_not_enough_memory
opened by jeff-hernandez 14
Consider adding new scalar comparison primitives for Datetime and Ordinal column types

In PR #2434 Datetime and Ordinal inputs were removed from the valid input types for four comparison primitives. This was done due to errors that could be encountered during feature value calculation. We should consider adding new primitives to replace the lost functionality, if these types of comparisons are needed.

The Ordinal comparison primitives might be too specific and may not be necessary, but the datetime comparison primitives could be useful.

Some primitives to add could include (naming could be improved): GreaterThanDate GreaterThanOrEqualToDate LessThanDate LessThanOrEqualToDate

opened by thehomebrewnerd 0
Add global test for NaturalLanguage primitives
As per discussion in #2413, this adds a test that checks strings that have caused errors in the past, ~~as well as randomized test inputs~~ to make sure none of our NaturalLanguage primitives hang or fail on generated input.

Adds get_natural_language_primitives and refactors get_transform_primitives and get_aggregation_primitives utility functions
opened by sbadithe 3

Fix warnings encountered during unit tests

Currently when running the full test suite over 17,000 warnings are generated:

2417 passed, 23 skipped, 87 xfailed, 17879 warnings

Some of these warning may be expected, but some should be addressed. This may require mutiple PRs, but we should investigate these warnings and implement fixes where possible. Of immediate concern are the deprecation warnings that might cause tests to break when new versions of dependencies are released:

featuretools/tests/primitive_tests/test_num_consecutive.py::TestNumConsecutiveLessMean::test_inf
  /dev/featuretools/featuretools/tests/primitive_tests/test_num_consecutive.py:259: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
    x = x.append(pd.Series([np.inf]))

This Dask warning also appears to be quite common and perhaps of concern.

featuretools/tests/primitive_tests/test_transform_features.py::test_comparisons_with_ordinal_invalid_inputs[dask_es]
  /dev/featuretools/env/lib/python3.9/site-packages/dask/dataframe/core.py:4134: UserWarning:
  You did not provide metadata, so Dask is running your function on a small dataset to guess output types. It is possible that Dask will guess incorrectly.
  To provide an explicit output types or to silence this message, please provide the `meta=` keyword, as described in the map or apply function that you are using.
    Before: .apply(func)
    After:  .apply(func, meta=('priority_level <= ordinal_invalid', 'object'))

    warnings.warn(meta_warning(meta))

opened by thehomebrewnerd 0

Remove numpy restriction in featuretools[spark] install when possible

In PR #2414, numpy was restricted to <1.24.0 when installing the featuretools[spark] requirements due to an incompatibility between pyspark and numpy==1.24.0 (likely due to pysparks' use of numpy aliases in which were removed in 1.24.0). Once pyspark is updated to work with numpy 1.24.0 we should remove the upper bound on numpy in the Featuretools requirements as well.

opened by thehomebrewnerd 0
Refactor computation of primitive lists in `DeepFeatureSynthesis` `__init__`
When building the following lists, there is a lot of code duplication:

self.groupby_trans_primitives

self.agg_primitives

self.where_primitives

self.trans_primitives

Furthermore, refactoring this logic outside of the __init__ would help make the code more expressive and testable.
enhancement refactor tech debt
opened by sbadithe 0

Releases(v1.20.0)

v1.20.0(Jan 5, 2023)
Jan 5, 2023

Enhancements

Add TimeSinceLastFalse, TimeSinceLastMax, TimeSinceLastMin, and TimeSinceLastTrue primitives (#2418)

Add MaxConsecutiveFalse, MaxConsecutiveNegatives, MaxConsecutivePositives, MaxConsecutiveTrue, MaxConsecutiveZeros, NumConsecutiveGreaterMean, NumConsecutiveLessMean (#2420)

Fixes

Fix typo in _handle_binary_comparison function name and update set_feature_names docstring (#2388)

Only allow Datetime time index as input to RateOfChange primitive (#2408)

Prevent catastrophic backtracking in regex for NumberOfWordsInQuotes (#2413)

Fix to eliminate fragmentation PerformanceWarning in feature_set_calculator.py (#2424)

Fix serialization of NumberOfCommonWords feature with custom word_set (#2432)

Improve edge case handling in NaturalLanguage primitives by standardizing delimiter regex (#2423)

Remove support for Datetime and Ordinal inputs in several primitives to prevent creation of Features that cannot be calculated (#2434)

Changes

Refactor _all_direct_and_same_path by deleting call to _features_have_same_path (#2400)

Refactor _build_transform_features by iterating over input_features once (#2400)

Iterate only once over ignore_columns in DeepFeatureSynthesis init (#2397)

Resolve empty Pandas series warnings (#2403)

Initialize Woodwork with init_with_partial_schama instead of init in EntitySet.add_last_time_indexes (#2409)

Updates for compatibility with numpy 1.24.0 (#2414)

The delimiter_regex parameter for TotalWordLength has been renamed to do_not_count (#2423)

Documentation Changes

Remove unused sections from 1.19.0 notes (#2396)

Thanks to the following people for contributing to this release: @gsheni, @rwedge, @sbadithe, @thehomebrewnerd

Breaking Changes

The delimiter_regex parameter for TotalWordLength has been renamed to do_not_count. Old saved features that had a non-default value for the parameter will no longer load.

Support for Datetime and Ordinal inputs has been removed from the LessThanScalar, GreaterThanScalar, LessThanEqualToScalar and GreaterThanEqualToScalar primitives.

Source code(tar.gz)
Source code(zip)
v1.19.0(Dec 9, 2022)
v1.19.0 Dec 9, 2022

Enhancements

Add OneDigitPostalCode and TwoDigitPostalCode primitives (#2365)

Add ExpandingCount, ExpandingMin, ExpandingMean, ExpandingMax, ExpandingSTD and ExpandingTrend primitives (#2343)

Fixes

Fix DeepFeatureSynthesis to consider the base_of_exclude family of attributes when creating transform features(#2380)

Fix bug with negative version numbers in test_version (#2389)

Fix bug in MultiplyNumericBoolean primitive that can cause an error with certain input dtype combinations (#2393)

Testing Changes

Fix version comparison in test_holiday_out_of_range (#2382)

Thanks to the following people for contributing to this release: @sbadithe, @thehomebrewnerd
Source code(tar.gz)
Source code(zip)
v1.18.0(Nov 15, 2022)
v1.18.0 Nov 15, 2022

Enhancements

Add RollingOutlierCount primitive (#2129)

Add RateOfChange primitive (#2359)

Fixes

Sets uses_full_dataframe for Rolling* and Exponential* primitives (#2354)

Updates for compatibility with upcoming Woodwork release 0.21.0 (#2363)

Updates demo dataset location to use new links (#2366)

Fix test_holiday_out_of_range after holidays release 0.17 (#2373)

Changes

Remove click and CLI functions (list-primitives, info) (#2353, #2358)

Documentation Changes

Build docs in parallel with Sphinx (#2351)

Use non-editable install to allow local docs build (#2367)

Remove primitives.featurelabs.com website from documentation (#2369)

Testing Changes

Replace use of pytest's tmpdir fixture with tmp_path (#2344)

Thanks to the following people for contributing to this release: @gsheni, @rwedge, @sbadithe, @tamargrey, @thehomebrewnerd
Source code(tar.gz)
Source code(zip)
v1.17.0(Oct 31, 2022)
v1.17.0 Oct 31, 2022

Enhancements

Add featuretools-sklearn-transformer as an extra installation option (#2335)

Add CountAboveMean, CountBelowMean, CountGreaterThan, CountInsideNthSTD, CountInsideRange, CountLessThan, CountOutsideNthSTD, CountOutsideRange (#2336)

Changes

Restructure primitives directory to use individual primitives files (#2331)

Restrict 2022.10.1 for dask and distributed (#2347)

Documentation Changes

Add Featuretools-SQL to Install page on documentation (#2337)

Fixes broken link in Featuretools documentation (#2339)

Thanks to the following people for contributing to this release: @gsheni, @rwedge, @sbadithe, @thehomebrewnerd

Source code(tar.gz)
Source code(zip)
v1.16.0(Oct 24, 2022)
Enhancements

Add ExponentialWeighted primitives and DateToTimeZone primitive (#2318)

Add 14 natural language primitives from nlp_primitives library (#2328)

Documentation Changes

Fix typos in aggregation_primitive_base.py and features_deserializer.py (#2317) (#2324)

Update SQL integration documentation to reflect Snowflake compatibility (#2313)

Testing Changes

Add Windows install test #2330

Thanks to the following people for contributing to this release: @gsheni, @sbadithe, @thehomebrewnerd
Source code(tar.gz)
Source code(zip)
v1.15.0(Oct 6, 2022)
v1.15.0 Oct 6, 2022

Enhancements

Add series_library attribute to EntitySet dictionary (#2257)

Leverage Library Enum inheriting from str (#2275)

Changes

Change default gap for Rolling* primitives from 0 to 1 to prevent accidental leakage (#2282)

Updates for pandas 1.5.0 compatibility (#2290, #2291, #2308)

Exclude documentation files from release workflow (#2295)

Bump requirements for optional pyspark dependency (#2299)

Bump scipy and woodwork[spark] dependencies (#2306)

Documentation Changes

Add documentation describing how to use featuretools_sql with featuretools (#2262)

Remove featuretools_sql as a docs requirement (#2302)

Fix typo in DiffDatetime doctest (#2314)

Fix typo in EntitySet documentation (#2315)

Testing Changes

Remove graphviz version restrictions in Windows CI tests (#2285)

Run CI tests with pytest -n auto (#2298, #2310)

Thanks to the following people for contributing to this release: @gsheni, @rwedge, @sbadithe, @thehomebrewnerd

Breaking Changes

The EntitySet schema has been updated to include a series_library attribute

The default behavior of the Rolling* primitives has changed in this release. If this primitive was used without defining the gap value, the feature values returned with this release will be different than feature values from prior releases.

Source code(tar.gz)
Source code(zip)
v1.15.0.dev0(Oct 5, 2022)

Developmental release for testing purposes
Source code(tar.gz)
Source code(zip)
v1.14.0(Sep 1, 2022)
v1.14.0 Sep 1, 2022

Enhancements

Replace NumericLag with Lag primitive (#2252)

Refactor build_features to speed up long running DFS calls by 50% (#2224)

Fixes

Fix compatibility issues with holidays 0.15 (#2254)

Changes

Update release notes to make clear conda release portion (#2249)

Use pyproject.toml only (move away from setup.cfg) (#2260, #2263, #2265)

Add entry point instructions for pyproject.toml project (#2272)

Documentation Changes

Fix to remove warning from Using Spark EntitySets Guide (#2258)

Testing Changes

Add tests/profiling/dfs_profile.py (#2224)

Add workflow to test featuretools without test dependencies (#2274)

Thanks to the following people for contributing to this release: @cp2boston, @gsheni, @ozzieD, @stefaniesmith, @thehomebrewnerd
Source code(tar.gz)
Source code(zip)
v1.13.0(Aug 18, 2022)
v1.13.0 Aug 18, 2022

Fixes

Allow boolean columns to be included in remove_highly_correlated_features (#2231)

Changes

Refactor schema version checking to use packaging method (#2230)

Extract duplicated logic for Rolling primitives into a general utility function (#2218)

Set pandas version to >=1.4.0 (#2246)

Remove workaround in roll_series_with_gap caused by pandas version < 1.4.0 (#2246)

Documentation Changes

Add line breaks between sections of IsFederalHoliday primitive docstring (#2235)

Testing Changes

Update create feedstock PR forked repo to use (#2223, #2237)

Update development requirements and use latest for documentation (#2225)

Thanks to the following people for contributing to this release: @gsheni, @ozzieD, @sbadithe, @tamargrey
Source code(tar.gz)
Source code(zip)
v1.12.1(Aug 4, 2022)
v1.12.1 Aug 4, 2022

Fixes

Update Trend and RollingTrend primitives to work with IntegerNullable inputs (#2204)

camel_and_title_to_snake handles snake case strings with numbers (#2220)

Change _get_description to split on blank lines to avoid truncating primitive descriptions (#2219)

Documentation Changes

Add instructions to add new users to featuretools feedstock (#2215)

Testing Changes

Add create feedstock PR workflow (#2181)

Add performance tests for python 3.9 and 3.10 (#2198, #2208)

Add test to ensure primitive docstrings use standardized verbs (#2200)

Configure codecov to avoid premature PR comments (#2209)

Thanks to the following people for contributing to this release: @gsheni, @rwedge, @sbadithe, @tamargrey, @thehomebrewnerd
Source code(tar.gz)
Source code(zip)
v1.12.0(Jul 19, 2022)
v1.12.0 Jul 19, 2022

warning: This release of Featuretools will not support Python 3.7

Enhancements

Add IsWorkingHours and IsLunchTime transform primitives (#2130)

Add periods parameter to Diff and add DiffDatetime primitive (#2155)

Add RollingTrend primitive (#2170)

Fixes

Resolves Woodwork integration test failure and removes Python version check for codecov (#2182)

Changes

Drop Python 3.7 support (#2169, #2186)

Add pre-commit hooks for linting (#2177)

Documentation Changes

Augment single table entry in DFS to include information about passing in a dictionary for dataframes argument (#2160)

Testing Changes

Standardize imports across test files to simplify accessing featuretools functions (#2166)

Thanks to the following people for contributing to this release: @dvreed77, @gsheni, @ozzieD, @rwedge, @sbadithe
Source code(tar.gz)
Source code(zip)
v1.11.1(Jul 5, 2022)
v1.11.1 Jul 5, 2022

Fixes

Remove 24th hour from PartOfDay primitive and add 0th hour (#2167)

Thanks to the following people for contributing to this release: @tamargrey
Source code(tar.gz)
Source code(zip)
v1.11.0(Jun 30, 2022)
v1.11.0 Jun 30, 2022

Enhancements

Add datetime and string types as valid arguments to dfs cutoff_time (#2147 )

Add PartOfDay transform primitive (#2128)

Add IsYearEnd, IsYearStart transform primitives (#2124)

Add Feature.set_feature_names method to directly set output column names for multi-output features (#2142)

Include np.nan testing for DayOfYear and DaysInMonth primitives (#2146)

Allow dfs kwargs to be passed into get_valid_primitives (#2157)

Fixes

Changes

Improve serialization and deserialization to reduce storage of duplicate primitive information (#2136, #2127, #2144)

Sort core requirements and test requirements in setup cfg (#2152)

Documentation Changes

Testing Changes

Fix pandas warning and reduce dask .apply warnings (#2145)

Pin graphviz version used in windows tests (#2159)

Thanks to the following people for contributing to this release: @gsheni, @ozzieD, @rwedge, @sbadithe, @tamargrey, @thehomebrewnerd
Source code(tar.gz)
Source code(zip)
v1.10.0(Jun 23, 2022)
v1.10.0 June 23, 2022

Enhancements

Add DayOfYear, DaysInMonth, Quarter, IsLeapYear, IsQuarterEnd, IsQuarterStart transform primitives (#2110, #2117)

Add IsMonthEnd, IsMonthStart transform primitives (#2121)

Move Quarter test cases (#2123)

Add summarize_primitives function for getting metrics about available primitives (#2099)

Changes

Changes for compatibility with numpy 1.23.0 (#2135, #2137)

Documentation Changes

Update contributing.md to add pandoc (#2103, #2104)

Update NLP primitives section of API reference (#2109)

Fixing release notes formatting (#2139)

Testing Changes

Latest dependency checker installs spark dependencies (#2112)

Fix test failures with pyspark v3.3.0 (#2114, #2120)

Thanks to the following people for contributing to this release: @gsheni, @ozzieD, @rwedge, @sbadithe, @thehomebrewnerd
Source code(tar.gz)
Source code(zip)
v1.9.2(Jun 10, 2022)
v1.9.2 June 10, 2022

Fixes

Add feature origin information to all multi-output feature columns (#2102)

Documentation Changes

Update contributing.md to add pandoc (#2103)

Thanks to the following people for contributing to this release: @gsheni, @thehomebrewnerd
Source code(tar.gz)
Source code(zip)
v1.9.1(May 27, 2022)
v1.9.1 May 27, 2022

Enhancements

Update DateToHoliday and DistanceToHoliday primitives to work with timezone-aware inputs (#2056)

Changes

Delete setup.py, MANIFEST.in and move configuration to pyproject.toml (#2046)

Documentation Changes

Update slack invite link to new (#2044)

Add slack and stackoverflow icon to footer (#2087)

Update dead links in docs and docstrings (#2092)

Testing Changes

Skip test for normalize_dataframe due to different error coming from Woodwork in 0.16.3 (#2052)

Fix Woodwork install in test with Woodwork main branch (#2055)

Use codecov action v3 (#2039)

Add workflow to kickoff EvalML unit tests with Featuretools main (#2072)

Rename yml to yaml for GitHub Actions workflows (#2073, #2077)

Update Dask test fixtures to prevent flaky behavior (#2079)

Update Makefile with better pkg command (#2081)

Add scheduled workflow that checks for broken links in documentation (#2084)

Thanks to the following people for contributing to this release: @gsheni, @rwedge, @thehomebrewnerd
Source code(tar.gz)
Source code(zip)
v1.9.0(Apr 27, 2022)
v1.9.0 Apr 27, 2022

Enhancements

Improve UnusedPrimitiveWarning with additional information (#2003)

Update DFS primitive matching to use all inputs defined in primitive input_types (#2019)

Add MultiplyNumericBoolean primitive (#2035)

Fixes

Fix issue with Ordinal inputs to binary comparison primitives (#2024, #2025)

Changes

Updated autonormalize version requirement (#2002)

Remove extra NaN checking in LatLong primitives (#1924)

Normalize LatLong NaN values during EntitySet creation (#1924)

Pass primitive dictionaries into check_primitive to avoid repetitive calls (#2016)

Remove Boolean and BooleanNullable from MultiplyNumeric primitive inputs (#2022)

Update serialization for compatibility with Woodwork version 0.16.1 (#2030)

Documentation Changes

Update README text to Alteryx (#2010, #2015)

Testing Changes

Update unit tests with Woodwork main branch workflow name (#2033)

Thanks to the following people for contributing to this release: @dvreed77, @gsheni, @rwedge, @thehomebrewnerd
Source code(tar.gz)
Source code(zip)
v1.8.0(Mar 31, 2022)
Changes

Removed make_trans_primitive and make_agg_primitive utility functions (#1970)

Documentation Changes

Update project urls in setup cfg to include Twitter and Slack (#1981)

Update nbconvert to version 6.4.5 to fix docs build issue (#1984)

Update ReadMe to have centered badges and add docs badge (#1993)

Add M1 installation instructions to docs and contributing (#1997)

Testing Changes

Updated scheduled workflows to only run on Alteryx owned repos (#1973)

Updated minimum dependency checker to use new version with write file support (#1975, #1976)

Add black linting package and remove autopep8 (#1978)

Update tests for compatibility with Woodwork version 0.15.0 (#1984)

Thanks to the following people for contributing to this release: @gsheni, @thehomebrewnerd
Source code(tar.gz)
Source code(zip)
v1.7.0(Mar 16, 2022)
v1.7.0 Mar 16, 2022

Enhancements

Add support for Python 3.10 (#1940)

Added the SquareRoot, NaturalLogarithm, Sine, Cosine and Tangent primitives (#1948)

Fixes

Updated the conda install commands to specify the channel (#1917)

Changes

Update error message when DFS returns an empty list of features (#1919)

Remove list_variable_types and related directories (#1929)

Transition to use pyproject.toml and setup.cfg (moving away from setup.py) (#1941, #1950, #1952, #1954, #1957, #1964 )

Replace Koalas with pandas API on Spark (#1949)

Documentation Changes

Add time series guide (#1896)

Update minimum nlp_primitives requirement for docs (#1925)

Add GitHub URL for PyPi (#1928)

Add backport release support (#1932)

Update instructions in release.md (#1963)

Testing Changes

Update test cases to cover main.py file (#1927)

Upgrade moto requirement (#1929, #1938)

Add Python 3.9 linting, install complete, and docs build CI tests (#1934)

Add CI workflow to test with latest woodwork main branch (#1936)

Add lower bound for wheel for minimum dependency checker and limit lint CI tests to Python 3.10 (#1945)

Fix non-deterministic test in test_es.py (#1961)

Thanks to the following people for contributing to this release: @andriyor, @gsheni, @jeff-hernandez, @kushal-gopal, @mingdavidqi, @rwedge, @tamargrey, @thehomebrewnerd, @tvdboom
Source code(tar.gz)
Source code(zip)
v1.7.0.dev2(Mar 16, 2022)

Development release for testing purposed only

Testing updated upload process
Source code(tar.gz)
Source code(zip)
v1.7.0.dev1(Mar 15, 2022)

Development release for testing purposes
Source code(tar.gz)
Source code(zip)
v1.7.0.dev0(Mar 15, 2022)

Development release for testing purposes
Source code(tar.gz)
Source code(zip)
v1.6.0(Feb 17, 2022)
v1.6.0 Feb 17, 2022

Enhancements

Add IsFederalHoliday transform primitive (#1912)

Fixes

Fix to catch new NotImplementedError raised by holidays library for unknown country (#1907)

Changes

Remove outdated pandas workaround code (#1906)

Documentation Changes

Add in-line tabs and copy-paste functionality to docs (#1905)

Testing Changes

Fix URL deserialization file (#1909)

Thanks to the following people for contributing to this release: @jeff-hernandez, @rwedge, @thehomebrewnerd
Source code(tar.gz)
Source code(zip)
v1.5.0(Feb 14, 2022)
v1.5.0 Feb 14, 2022

warning: Featuretools may not support Python 3.7 in next non-bugfix release.

Enhancements

Add ability to use offset alias strings as inputs to rolling primitives (#1809)

Update to add support for pandas version 1.4.0 (#1881, #1895)

Fixes

Fix featuretools_primitives entry point (#1891)

Changes

Allow only snake camel and title case for primitives (#1854)

Add autonormalize as an add-on library (#1840)

Add DateToHoliday Transform Primitive (#1848)

Add DistanceToHoliday Transform Primitive (#1853)

Temporarily restrict pandas and koalas max versions (#1863)

Add __setitem__ method to overload add_dataframe method on EntitySet (#1862)

Add support for woodwork 0.12.0 (#1872, #1897)

Split Datetime and LatLong primitives into separate files (#1861)

Null values will not be included in index of normalized dataframe (#1897)

Documentation Changes

Bump ipython version (#1857)

Update README.md with Alteryx link (#1886)

Testing Changes

Add check for package conflicts with install workflow (#1843)

Change auto approve workflow to use assignee (#1843)

Update auto approve workflow to delete branch and change on trigger (#1852)

Upgrade tests to use compose version 0.8.0 (#1856)

Updated deep feature synthesis and feature serialization tests to use new primitive files (#1861)

Thanks to the following people for contributing to this release: @dvreed77, @gsheni, @jacobboney, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd, @tuethan1999
Source code(tar.gz)
Source code(zip)
v1.4.1(Jan 28, 2022)
v1.4.1 Jan 28, 2022

Changes

Set upper bound for compatible Woodwork version (#1872)

Restrict pandas and koalas max versions (#1863)

Testing Changes

Upgrade tests to use compose version 0.8.0 (#1856)

Thanks to the following people for contributing to this release: @dvreed77, @thehomebrewnerd
Source code(tar.gz)
Source code(zip)
v1.4.0(Jan 11, 2022)
Enhancements

Add LatLong transform primitives - GeoMidpoint, IsInGeoBox, CityblockDistance (#1814)

Add issue templates for bugs, feature requests and documentation improvements (#1834)

Fixes

Fix bug where Woodwork initialization could fail on feature matrix if cutoff times caused null values to be introduced (#1810)

Changes

Skip code coverage for specific dask usage lines (#1829)

Increase minimum required numpy version to 1.21.0, scipy to 1.3.3, koalas to 1.8.1 (#1833)

Remove pyyaml as a requirement (#1833)

Documentation Changes

Remove testing on conda forge in release.md (#1811)

Testing Changes

Enable auto-merge for minimum and latest dependency merge requests (#1818, #1821, #1822)

Change auto approve workfow to use PR number and run every 30 minutes (#1827)

Add auto approve workflow to run when unit tests complete (#1837)

Test deserializing from S3 with mocked S3 fixtures only (#1825)

Remove fastparquet as a test requirement (#1833)

Thanks to the following people for contributing to this release: @davesque, @gsheni, @rwedge, @thehomebrewnerd
Source code(tar.gz)
Source code(zip)
v1.3.0(Dec 2, 2021)
Enhancements

Add NumericLag transform primitive #1797

Changes

Update pip to 21.3.1 for test requirements #1789

Documentation Changes

Add Docker install instructions and documentation on the install page. #1785

Update install page on documentation with correct python version #1784

Fix formatting in Improving Computational Performance guide #1786

Thanks to the following people for contributing to this release: @gsheni, @HenryRocha, @tamargrey, @thehomebrewnerd
Source code(tar.gz)
Source code(zip)
v1.3.0.dev0(Dec 2, 2021)

Development release for testing purposes
Source code(tar.gz)
Source code(zip)
v1.2.0(Nov 15, 2021)
Enhancements

Add Rolling Transform primitives with integer parameters (#1770)

Fixes

Handle new graphviz FORMATS import (#1770)

Changes

Add new version of featuretools_tsfresh_primitives as an add-on library (#1772)

Add load_weather as demo dataset for time series (#1777)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey
Source code(tar.gz)
Source code(zip)
v1.2.0.dev0(Nov 15, 2021)

Development release for testing purposes
Source code(tar.gz)
Source code(zip)

An open source python library for automated feature engineering

Related tags

Overview

Installation

Add-ons

Example

Demos

Testing & Development

Support

Citing Featuretools

Built at Alteryx Innovation Labs

Comments

Bug/Feature Request Description

Pull Request Description

Pull Request Description

Bug with parallel feature matrix calculation within sklearn cross-validation

Bug Description

Output of featuretools.show_info()

SYSTEM INFO

INSTALLED VERSIONS

Pull Request Description

Releases(v1.20.0)

v1.20.0(Jan 5, 2023)

v1.19.0(Dec 9, 2022)

v1.18.0(Nov 15, 2022)

v1.18.0 Nov 15, 2022

v1.17.0(Oct 31, 2022)

v1.17.0 Oct 31, 2022

v1.16.0(Oct 24, 2022)

v1.15.0(Oct 6, 2022)

v1.15.0 Oct 6, 2022

Breaking Changes

v1.15.0.dev0(Oct 5, 2022)

v1.14.0(Sep 1, 2022)

v1.14.0 Sep 1, 2022

v1.13.0(Aug 18, 2022)

v1.13.0 Aug 18, 2022

v1.12.1(Aug 4, 2022)

v1.12.1 Aug 4, 2022

v1.12.0(Jul 19, 2022)

v1.12.0 Jul 19, 2022

v1.11.1(Jul 5, 2022)

v1.11.1 Jul 5, 2022

v1.11.0(Jun 30, 2022)

v1.10.0(Jun 23, 2022)

v1.10.0 June 23, 2022

v1.9.2(Jun 10, 2022)

v1.9.1(May 27, 2022)

v1.9.0(Apr 27, 2022)

v1.8.0(Mar 31, 2022)

v1.7.0(Mar 16, 2022)

v1.7.0.dev2(Mar 16, 2022)

v1.7.0.dev1(Mar 15, 2022)

v1.7.0.dev0(Mar 15, 2022)

v1.6.0(Feb 17, 2022)

v1.6.0 Feb 17, 2022

v1.5.0(Feb 14, 2022)

v1.5.0 Feb 14, 2022

v1.4.1(Jan 28, 2022)

v1.4.1 Jan 28, 2022

v1.4.0(Jan 11, 2022)

v1.3.0(Dec 2, 2021)

v1.3.0.dev0(Dec 2, 2021)

v1.2.0(Nov 15, 2021)

v1.2.0.dev0(Nov 15, 2021)

Owner

alteryx

open-source feature selection repository in python

scikit-learn addon to operate on set/"group"-based features

Automatic extraction of relevant features from time series:

a feature engineering wrapper for sklearn

A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

A fast xgboost feature selection algorithm

An open source python library for automated feature engineering

Python implementations of the Boruta all-relevant feature selection method.

A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.

Output of `featuretools.show_info()`