An open source python library for automated feature engineering

Overview

Featuretools

"One of the holy grails of machine learning is to automate more and more of the feature engineering process." ― Pedro Domingos, A Few Useful Things to Know about Machine Learning

Tests Coverage Status PyPI version Anaconda-Server Badge StackOverflow Downloads

Featuretools is a python library for automated feature engineering. See the documentation for more information.

Installation

Install with pip

python -m pip install featuretools

or from the Conda-forge channel on conda:

conda install -c conda-forge featuretools

Add-ons

You can install add-ons individually or all at once by running

python -m pip install featuretools[complete]

Update checker - Receive automatic notifications of new Featuretools releases

python -m pip install featuretools[update_checker]

TSFresh Primitives - Use 60+ primitives from tsfresh within Featuretools

python -m pip install featuretools[tsfresh]

Example

Below is an example of using Deep Feature Synthesis (DFS) to perform automated feature engineering. In this example, we apply DFS to a multi-table dataset consisting of timestamped customer transactions.

>> import featuretools as ft
>> es = ft.demo.load_mock_customer(return_entityset=True)
>> es.plot()

Featuretools can automatically create a single table of features for any "target entity"

>> feature_matrix, features_defs = ft.dfs(entityset=es, target_entity="customers")
>> feature_matrix.head(5)
            zip_code  COUNT(transactions)  COUNT(sessions)  SUM(transactions.amount) MODE(sessions.device)  MIN(transactions.amount)  MAX(transactions.amount)  YEAR(join_date)  SKEW(transactions.amount)  DAY(join_date)                   ...                     SUM(sessions.MIN(transactions.amount))  MAX(sessions.SKEW(transactions.amount))  MAX(sessions.MIN(transactions.amount))  SUM(sessions.MEAN(transactions.amount))  STD(sessions.SUM(transactions.amount))  STD(sessions.MEAN(transactions.amount))  SKEW(sessions.MEAN(transactions.amount))  STD(sessions.MAX(transactions.amount))  NUM_UNIQUE(sessions.DAY(session_start))  MIN(sessions.SKEW(transactions.amount))
customer_id                                                                                                                                                                                                                                  ...
1              60091                  131               10                  10236.77               desktop                      5.60                    149.95             2008                   0.070041               1                   ...                                                     169.77                                 0.610052                                   41.95                               791.976505                              175.939423                                 9.299023                                 -0.377150                                5.857976                                        1                                -0.395358
2              02139                  122                8                   9118.81                mobile                      5.81                    149.15             2008                   0.028647              20                   ...                                                     114.85                                 0.492531                                   42.96                               596.243506                              230.333502                                10.925037                                  0.962350                                7.420480                                        1                                -0.470007
3              02139                   78                5                   5758.24               desktop                      6.78                    147.73             2008                   0.070814              10                   ...                                                      64.98                                 0.645728                                   21.77                               369.770121                              471.048551                                 9.819148                                 -0.244976                               12.537259                                        1                                -0.630425
4              60091                  111                8                   8205.28               desktop                      5.73                    149.56             2008                   0.087986              30                   ...                                                      83.53                                 0.516262                                   17.27                               584.673126                              322.883448                                13.065436                                 -0.548969                               12.738488                                        1                                -0.497169
5              02139                   58                4                   4571.37                tablet                      5.91                    148.17             2008                   0.085883              19                   ...                                                      73.09                                 0.830112                                   27.46                               313.448942                              198.522508                                 8.950528                                  0.098885                                5.599228                                        1                                -0.396571

[5 rows x 69 columns]

We now have a feature vector for each customer that can be used for machine learning. See the documentation on Deep Feature Synthesis for more examples.

Featuretools contains many different types of built-in primitives for creating features. If the primitive you need is not included, Featuretools also allows you to define your own custom primitives.

Demos

Predict Next Purchase

Repository | Notebook

In this demonstration, we use a multi-table dataset of 3 million online grocery orders from Instacart to predict what a customer will buy next. We show how to generate features with automated feature engineering and build an accurate machine learning pipeline using Featuretools, which can be reused for multiple prediction problems. For more advanced users, we show how to scale that pipeline to a large dataset using Dask.

For more examples of how to use Featuretools, check out our demos page.

Testing & Development

The Featuretools community welcomes pull requests. Instructions for testing and development are available here.

Support

The Featuretools community is happy to provide support to users of Featuretools. Project support can be found in four places depending on the type of question:

  1. For usage questions, use Stack Overflow with the featuretools tag.
  2. For bugs, issues, or feature requests start a Github issue.
  3. For discussion regarding development on the core library, use Slack.
  4. For everything else, the core developers can be reached by email at [email protected].

Citing Featuretools

If you use Featuretools, please consider citing the following paper:

James Max Kanter, Kalyan Veeramachaneni. Deep feature synthesis: Towards automating data science endeavors. IEEE DSAA 2015.

BibTeX entry:

@inproceedings{kanter2015deep,
  author    = {James Max Kanter and Kalyan Veeramachaneni},
  title     = {Deep feature synthesis: Towards automating data science endeavors},
  booktitle = {2015 {IEEE} International Conference on Data Science and Advanced Analytics, DSAA 2015, Paris, France, October 19-21, 2015},
  pages     = {1--10},
  year      = {2015},
  organization={IEEE}
}

Built at Alteryx Innovation Labs

Alteryx Innovation Labs
Issues
  • Spark Example for Featuretools

    Spark Example for Featuretools

    Bug/Feature Request Description

    In notebooks such as here: https://github.com/Featuretools/predict-next-purchase/blob/master/Tutorial.ipynb and documentation: https://docs.featuretools.com/usage_tips/scaling.html

    It mentions the ability to scale to Spark. Could an example be provided like it was for dask here: https://github.com/Featuretools/predict-next-purchase?


    Issues created here on Github are for bugs or feature requests. For usage questions and questions about errors, please ask on Stack Overflow with the featuretools tag. Check the documentation for further guidance on where to ask your question.

    opened by charliec443 26
  • Refactor LatLong and Datetime Primitives into Separate Files

    Refactor LatLong and Datetime Primitives into Separate Files

    Pull Request Description

    • Fixes #1855

    Changes: I decided to split all classes containing Lat/Long functions into their own file as well as classes containing date/time into their own file. In each file I also organized classes in alphabetical order. I don't believe there are any conflicts with the new files as I was able to run the tests.

    Comments: Whenever someone is able to review my changes I would also appreciate some input/advice on the testing. I am running them as described on Ubuntu. They run to the end but I do have some failed tests, not sure if this is due to my changes or if it is just part of the process.

    As an aside I apologize for all of the unnecessary commits. I'm still getting the hang of it and understand now I may have gone overboard. Also, I accidentally deleted my original branch which is why I am submitting a second pull request.

    opened by jacobboney 21
  • “IndexError: Too many levels” when running Featuretools dfs after upgrade

    “IndexError: Too many levels” when running Featuretools dfs after upgrade

    Featuretools' dfs() method fails to run on my entity set after upgrading from v0.1.21 to v0.2.x and v0.3.0.

    The error is raised when the Pandas backend tries to calculate the aggregate features _calculate_agg_features(). In particular:

    --> 442 to_merge.reset_index(1, drop=True, inplace=True) ... IndexError: Too many levels: Index has only 1 level, not 2

    This is working fine in v0.1.x and the entity set hasn't changed after the upgrade. The entity set is composed of 7 entities and 6 relationships. Each entity (dataframe) is added via entity_from_dataframe.

    opened by jrkinley-zz 20
  • Memory crashing when using featuretools/dask

    Memory crashing when using featuretools/dask

    I'm not sure what I'm doing wrong, but basically I'm taking a fairly large dataframe(11GB) and converting it to dask before running featuretools on it. During DFS my system is running out of memory, which is strange to me because I thought it should be writing to disk?

    from dask.distributed import Client, progress
    
    client = Client(n_workers=2, threads_per_worker=2, memory_limit='2GB')
    client
    
    import featuretools as ft
    import dask.dataframe as dd
    dt = {}
    dt.update(dict.fromkeys(catgoricalValues, ft.variable_types.Categorical))
    dt.update(dict.fromkeys(NumericColumns, ft.variable_types.Numeric))
    dask_df = dd.from_pandas(Main[NumericColumns + catgoricalValues], npartitions=50000)
    dask_df  # this works
    
    # Make an entityset and add the entity
    es = ft.EntitySet(id = 'Test')
    es = es.entity_from_dataframe(entity_id="dask_entity", dataframe=dask_df, make_index = True, index="index", variable_types=dt)
    
    # primatives to use
    default_agg_primitives =  ["sum", "std", "max", "min", "mean", "count", "percent_true", "num_unique"]
    default_trans_primitives =  ["add_numeric", 'multiply_numeric']
    
    feature_matrix, feature_defs = ft.dfs(entityset = es, target_entity = 'dask_entity',
                                           trans_primitives = default_trans_primitives,
                                           agg_primitives=default_agg_primitives, 
                                            max_depth = 2, features_only=False, verbose = True)
    

    My session crashes at this point from using all the memory. I followed various tutorials but I'm not sure what I'm doing wrong? My goal is after DFS is done, I would save the results to a file that I can then pass on to TF/Keras.

    opened by gautambak 16
  • How is `DIFF` calculated?

    How is `DIFF` calculated?

    I read docs but can't understand how does DIFF calculate its value.

    This part of my example:

    Screen Shot 2019-11-12 at 22 00 07

    I generated this dataframe using dfs(..., time_window=None)

    (time in index is meaning cutoff_time)

    What I can't understand is, DIFF(MAX(sales.amount)) will be calculated by applying DIFF on MAX(sales.amount) but since MAX(sales.amount) is an aggregated value, which would be only one single value(=max value before cutoff time), how does DIFF calculate its value? I think that DIFF requires at least 2 values to calculate?...

    If I missed something, please let me know how is first value of DIFF(MAX(sales.amount)), 25714.287, calculated..

    Thanks

    opened by rightx2 16
  • Calculating direct features use default value if parent missing

    Calculating direct features use default value if parent missing

    Pull Request Description

    (replace this text with your description)


    After creating the pull request: in order to pass the release_notes_updated check you will need to update the "Future Release" section of docs/source/release_notes.rst to include this pull request.

    opened by seriallazer 15
  • Support/approach for sliding window/multiple snapshots in time

    Support/approach for sliding window/multiple snapshots in time

    Hi there! (first of all huge thx for dfs, vision & tools, superb work)

    My question, the predict_next_purchase sample uses a single cut_off time right? But doesnt that remove a lot of data that could help with the purchase prediction? and we're only using a single day for reference right?

    only this data/users -> "Using users who had acivity during training_window days before the cutoff_time, we look to see if they purchase the product in the prediction_window."

    I would like to use all data in a single final ml table for the models. Is there support to have the cut off being a sliding window (ex: for each customer) of features from last x days, predicting purchase (yes/no) up x days in the future. So each customer would appear multiple times, depending on choosen sliding window.

    Think it's a tipical pattern in predicting future events (predictive maintenance, churn, healthcare). Usually applies to any kind of event prediction. (ex: for every user, machine, predict probability of event E for the next x days for a specific point in time, obv the training the dataset has proper timestamps so that we can "recalculate" feature values for user/machine up to at any point in time)

    The dataset becomes non IID obv, some cautions apply.

    Makes sense? What's the approach to use DFS with these scenarios? thx!

    opened by rquintino 15
  • LatLong type

    LatLong type

    The issue in testing comes from mock_ds.py where the mock retail entityset is made with es.entity_from_csv(entity, (line 292). This makes the latlong type in that entityset a string rather than a tuple. The options as I understand them are:

    1. Modify Latitude and Longitude to check if the latlong is a string
    2. Modify entity_from_csv to convert certain strings to tuples
    3. Change the test to do the pandas _from_csv, modify the dataframe and then make entity_from_dataframe.
    4. Leave Latitude and Longitude with no real tests for now.

    My gut is to go with 3. Do you have a preference @kmax12?

    opened by Seth-Rothschild 15
  • Bug with parallel feature matrix calculation within sklearn cross-validation

    Bug with parallel feature matrix calculation within sklearn cross-validation

    Bug with parallel feature matrix calculation within sklearn cross-validation


    Bug Description

    Hello, guys! Thank you for the quick release of featuretools 1.1.0 !

    During my research I have faced the following bug: I have an estimator which is actually an imblearn Pipeline. The estimator consists of several steps including my custom transformer which calculates feature matrix with featuretools. And I want to check the quality of the model with sklearn cross_validate function . If I set n_jobs > 1 both in featuretools.calculate_feature_matrix and in sklearn.cross_validate, then I get an unexpected error ValueError: cannot find context for 'loky'. When either one of n_jobs is set to 1, then everything works fine.

    I googled for some time and I understood that such error might happen when parallelization is used without if __name__ == '__main__' - but it's the best information I've got - nothing more valuable. So for me it looks like there is some conflict in parallelization usage in sklearn and featuretools. And as far both of the libraries are essential as well as parallelization working with big data, i really hope you will be able to find a way to fix it :)

    P.S this problem was actual before 1.0.0 release - previously I used 0.24.0 and still faced it

    Output of featuretools.show_info()

    Featuretools version: 1.1.0

    SYSTEM INFO

    python: 3.7.5.final.0 python-bits: 64 OS: Darwin OS-release: 19.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: ru_RU.UTF-8 LOCALE: ru_RU.UTF-8

    INSTALLED VERSIONS

    numpy: 1.21.1 pandas: 1.3.2 tqdm: 4.62.2 PyYAML: 5.4.1 cloudpickle: 1.6.0 dask: 2021.10.0 distributed: 2021.10.0 psutil: 5.8.0 pip: 19.2.3 setuptools: 41.2.0

    opened by VGODIE 14
  • Add include_cutoff_time arg to control whether data at cutoff times a…

    Add include_cutoff_time arg to control whether data at cutoff times a…

    Add include_cutoff_time arg to control whether data at cutoff times are included in feature calculations and prevent traininig_window overlapping

    Pull Request Description

    There was a data overlapping problem when calculating the feature matrix: The data at cutoff time might be used both in calculating features and in calculating target values(#918 ). This could cause data cheating and affect the result as well. There was a trial to solve the issue (#930 ), but It still didn't solve the cheating problem. So, we decided to parameterize it to control whether data at cutoff times are included in feature calculations or not(#942 ) and this PR solves it.

    opened by rightx2 14
  • Fixed #297 update tests to check error strings

    Fixed #297 update tests to check error strings

    • On windows platform, there is an open issue currently in pandas where it raises an error when reading a file with accents in the file path (i.e. régions.csv). So, I resolved it with the following:
    # featuretools\tests\testing_utils\mock_ds.py:334
    df = pd.read_csv(open(filenames[entity], 'r', encoding='utf8'), encoding='utf-8')
    
    • This snippet np.dtype((np.integer, np.floating)).type was causing this issue. So, I resolved it by changing it to the following:
    np.issubdtype(time, np.integer) or np.issubdtype(time, np.floating)
    
    • Not sure how to get the error text for test_not_enough_memory
    opened by jeff-hernandez 14
  • EPIC: Refactor `DeepFeatureSynthesis._build_features`

    EPIC: Refactor `DeepFeatureSynthesis._build_features`

    DeepFeatureSynthesis._build_features is in need of a refactor to improve speed, maintainability, and scalability.

    There are many optimizations that can be made underneath this function to improve performance while maintaining API signature. As a rough benchmark, the get_valid_primitives function takes 2 hours to run on the retail entityset to produce a little over 5 million feature defintions. This can be optimized to be much take a much shorter time.

    Functions should be more granular and testable:

    For example one of the most granular functions should take a datastructure which is a hashmap of features, keyed by their ColumnSchema as a single argument, and another argument which is an inputset (eg. Numeric, Boolean), and return a list of lists of all feature combinations that match this inputtype signature. This function should be pure, which would improve maintainability by being very readable and testable.

    Optimizations:

    Caching

    Using the example above, this function could be wrapped with an LRU Cache decorator that would allow primitives that have input signatures matching other primitives to return immediately. Memory issues should be of little concern since these calculations can be perfomed using very datastructures containing logical types only and no data, but this should be measured and tested.

    Data Structures

    Features and primitives should be hashed by their associated logical types for faster lookup.

    opened by dvreed77 0
  • add utils function for grabbing primitive metrics

    add utils function for grabbing primitive metrics

    • fixes: https://github.com/alteryx/featuretools/issues/2051
    • Adds summarize_primitives function which provides a pandas DataFrame of metrics about currently available primitives.
    opened by ozzieD 1
  • Featuretools builds features using index column with Equal and NotEqual primitives

    Featuretools builds features using index column with Equal and NotEqual primitives

    Featuretools builds features using index column with Equal and NotEqual primitives

    Code Sample, a copy-pastable example to reproduce your bug.

    import pandas as pd
    import featuretools as ft
    
    df = pd.DataFrame({
        "id": [0, 1, 2, 3],
        "vals": [100, 1, 2, 3],
    })
    
    es = ft.EntitySet()
    es.add_dataframe(dataframe_name="df", dataframe=df, index="id", make_index=False)
    
    fm, _ = ft.dfs(entityset=es, target_dataframe_name="df", trans_primitives=["equal", "not_equal"])
    fm
    
        vals  id = vals  id != vals
    id                             
    0    100      False        True
    1      1       True       False
    2      2       True       False
    3      3       True       False
    

    In the example above, I would not expect any features using the id index column to be present in the output feature matrix.

    This is likely related to #1851 and happens because we allow comparison between columns with schemas of <ColumnSchema (Logical Type = Integer) (Semantic Tags = ['index'])> and <ColumnSchema (Logical Type = Integer) (Semantic Tags = ['numeric'])>

    bug 
    opened by thehomebrewnerd 0
  • Add tests that confirm primitive input_types are the expected shapes

    Add tests that confirm primitive input_types are the expected shapes

    There are a number of assumptions we make about the shape of Primitive input_types lists:

    • Its either a list of ColumnSchema objects or a list of lists of ColumnSchema objects (and not a combination)
    • All sub-lists are the same length
    • No input_types list or sublist is empty

    As we may need to rely on these assumptions at some point, we should add tests that confirm these assumptions for all primitives, so that if we add a Primitive that breaks any of these assumptions in the future, we are notified.

    opened by tamargrey 0
  • DFS fails using Dask EntitySet with categorical index

    DFS fails using Dask EntitySet with categorical index

    DFS fails on a Dask EntitySet that contains a categorical index

    Code Sample - use the attached dataset

    import dask.dataframe as dd
    import featuretools as ft
    
    orders = dd.read_csv("orders.csv")
    
    es = ft.EntitySet()
    
    order_ltypes = {
        "order_id": "categorical",
    }
    
    es.add_dataframe(dataframe_name="orders",
                     dataframe=orders,
                     index="order_id",
                     logical_types=order_ltypes)
    
    fm, features = ft.dfs(entityset=es, target_dataframe_name="orders")
    
    
    NotImplementedError: `df.column.cat.categories` with unknown categories is not supported.  Please use `column.cat.as_known()` or `df.categorize()` beforehand to ensure known categories
    

    orders.csv

    bug 
    opened by thehomebrewnerd 0
Releases(v1.9.2)
  • v1.9.2(Jun 10, 2022)

    v1.9.2 June 10, 2022

    • Fixes
      • Add feature origin information to all multi-output feature columns (#2102)
    • Documentation Changes
      • Update contributing.md to add pandoc (#2103)

    Thanks to the following people for contributing to this release: @gsheni, @thehomebrewnerd

    Source code(tar.gz)
    Source code(zip)
  • v1.9.1(May 27, 2022)

    v1.9.1 May 27, 2022

    • Enhancements
      • Update DateToHoliday and DistanceToHoliday primitives to work with timezone-aware inputs (#2056)
    • Changes
      • Delete setup.py, MANIFEST.in and move configuration to pyproject.toml (#2046)
    • Documentation Changes
      • Update slack invite link to new (#2044)
      • Add slack and stackoverflow icon to footer (#2087)
      • Update dead links in docs and docstrings (#2092)
    • Testing Changes
      • Skip test for normalize_dataframe due to different error coming from Woodwork in 0.16.3 (#2052)
      • Fix Woodwork install in test with Woodwork main branch (#2055)
      • Use codecov action v3 (#2039)
      • Add workflow to kickoff EvalML unit tests with Featuretools main (#2072)
      • Rename yml to yaml for GitHub Actions workflows (#2073, #2077)
      • Update Dask test fixtures to prevent flaky behavior (#2079)
      • Update Makefile with better pkg command (#2081)
      • Add scheduled workflow that checks for broken links in documentation (#2084)

    Thanks to the following people for contributing to this release: @gsheni, @rwedge, @thehomebrewnerd

    Source code(tar.gz)
    Source code(zip)
  • v1.9.0(Apr 27, 2022)

    v1.9.0 Apr 27, 2022

    • Enhancements
      • Improve UnusedPrimitiveWarning with additional information (#2003)
      • Update DFS primitive matching to use all inputs defined in primitive input_types (#2019)
      • Add MultiplyNumericBoolean primitive (#2035)
    • Fixes
      • Fix issue with Ordinal inputs to binary comparison primitives (#2024, #2025)
    • Changes
      • Updated autonormalize version requirement (#2002)
      • Remove extra NaN checking in LatLong primitives (#1924)
      • Normalize LatLong NaN values during EntitySet creation (#1924)
      • Pass primitive dictionaries into check_primitive to avoid repetitive calls (#2016)
      • Remove Boolean and BooleanNullable from MultiplyNumeric primitive inputs (#2022)
      • Update serialization for compatibility with Woodwork version 0.16.1 (#2030)
    • Documentation Changes
      • Update README text to Alteryx (#2010, #2015)
    • Testing Changes
      • Update unit tests with Woodwork main branch workflow name (#2033)

    Thanks to the following people for contributing to this release: @dvreed77, @gsheni, @rwedge, @thehomebrewnerd

    Source code(tar.gz)
    Source code(zip)
  • v1.8.0(Mar 31, 2022)

    • Changes
      • Removed make_trans_primitive and make_agg_primitive utility functions (#1970)
    • Documentation Changes
      • Update project urls in setup cfg to include Twitter and Slack (#1981)
      • Update nbconvert to version 6.4.5 to fix docs build issue (#1984)
      • Update ReadMe to have centered badges and add docs badge (#1993)
      • Add M1 installation instructions to docs and contributing (#1997)
    • Testing Changes
      • Updated scheduled workflows to only run on Alteryx owned repos (#1973)
      • Updated minimum dependency checker to use new version with write file support (#1975, #1976)
      • Add black linting package and remove autopep8 (#1978)
      • Update tests for compatibility with Woodwork version 0.15.0 (#1984)

    Thanks to the following people for contributing to this release: @gsheni, @thehomebrewnerd

    Source code(tar.gz)
    Source code(zip)
  • v1.7.0(Mar 16, 2022)

    v1.7.0 Mar 16, 2022

    • Enhancements
      • Add support for Python 3.10 (#1940)
      • Added the SquareRoot, NaturalLogarithm, Sine, Cosine and Tangent primitives (#1948)
    • Fixes
      • Updated the conda install commands to specify the channel (#1917)
    • Changes
      • Update error message when DFS returns an empty list of features (#1919)
      • Remove list_variable_types and related directories (#1929)
      • Transition to use pyproject.toml and setup.cfg (moving away from setup.py) (#1941, #1950, #1952, #1954, #1957, #1964 )
      • Replace Koalas with pandas API on Spark (#1949)
    • Documentation Changes
      • Add time series guide (#1896)
      • Update minimum nlp_primitives requirement for docs (#1925)
      • Add GitHub URL for PyPi (#1928)
      • Add backport release support (#1932)
      • Update instructions in release.md (#1963)
    • Testing Changes
      • Update test cases to cover main.py file (#1927)
      • Upgrade moto requirement (#1929, #1938)
      • Add Python 3.9 linting, install complete, and docs build CI tests (#1934)
      • Add CI workflow to test with latest woodwork main branch (#1936)
      • Add lower bound for wheel for minimum dependency checker and limit lint CI tests to Python 3.10 (#1945)
      • Fix non-deterministic test in test_es.py (#1961)

    Thanks to the following people for contributing to this release: @andriyor, @gsheni, @jeff-hernandez, @kushal-gopal, @mingdavidqi, @rwedge, @tamargrey, @thehomebrewnerd, @tvdboom

    Source code(tar.gz)
    Source code(zip)
  • v1.7.0.dev2(Mar 16, 2022)

  • v1.7.0.dev1(Mar 15, 2022)

  • v1.7.0.dev0(Mar 15, 2022)

  • v1.6.0(Feb 17, 2022)

    v1.6.0 Feb 17, 2022

    • Enhancements
      • Add IsFederalHoliday transform primitive (#1912)
    • Fixes
      • Fix to catch new NotImplementedError raised by holidays library for unknown country (#1907)
    • Changes
      • Remove outdated pandas workaround code (#1906)
    • Documentation Changes
      • Add in-line tabs and copy-paste functionality to docs (#1905)
    • Testing Changes
      • Fix URL deserialization file (#1909)

    Thanks to the following people for contributing to this release: @jeff-hernandez, @rwedge, @thehomebrewnerd

    Source code(tar.gz)
    Source code(zip)
  • v1.5.0(Feb 14, 2022)

    v1.5.0 Feb 14, 2022

    warning: Featuretools may not support Python 3.7 in next non-bugfix release.

    • Enhancements
      • Add ability to use offset alias strings as inputs to rolling primitives (#1809)
      • Update to add support for pandas version 1.4.0 (#1881, #1895)
    • Fixes
      • Fix featuretools_primitives entry point (#1891)
    • Changes
      • Allow only snake camel and title case for primitives (#1854)
      • Add autonormalize as an add-on library (#1840)
      • Add DateToHoliday Transform Primitive (#1848)
      • Add DistanceToHoliday Transform Primitive (#1853)
      • Temporarily restrict pandas and koalas max versions (#1863)
      • Add __setitem__ method to overload add_dataframe method on EntitySet (#1862)
      • Add support for woodwork 0.12.0 (#1872, #1897)
      • Split Datetime and LatLong primitives into separate files (#1861)
      • Null values will not be included in index of normalized dataframe (#1897)
    • Documentation Changes
      • Bump ipython version (#1857)
      • Update README.md with Alteryx link (#1886)
    • Testing Changes
      • Add check for package conflicts with install workflow (#1843)
      • Change auto approve workflow to use assignee (#1843)
      • Update auto approve workflow to delete branch and change on trigger (#1852)
      • Upgrade tests to use compose version 0.8.0 (#1856)
      • Updated deep feature synthesis and feature serialization tests to use new primitive files (#1861)

    Thanks to the following people for contributing to this release: @dvreed77, @gsheni, @jacobboney, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd, @tuethan1999

    Source code(tar.gz)
    Source code(zip)
  • v1.4.1(Jan 28, 2022)

    v1.4.1 Jan 28, 2022

    • Changes
      • Set upper bound for compatible Woodwork version (#1872)
      • Restrict pandas and koalas max versions (#1863)
    • Testing Changes
      • Upgrade tests to use compose version 0.8.0 (#1856)

    Thanks to the following people for contributing to this release: @dvreed77, @thehomebrewnerd

    Source code(tar.gz)
    Source code(zip)
  • v1.4.0(Jan 11, 2022)

    • Enhancements
      • Add LatLong transform primitives - GeoMidpoint, IsInGeoBox, CityblockDistance (#1814)
      • Add issue templates for bugs, feature requests and documentation improvements (#1834)
    • Fixes
      • Fix bug where Woodwork initialization could fail on feature matrix if cutoff times caused null values to be introduced (#1810)
    • Changes
      • Skip code coverage for specific dask usage lines (#1829)
      • Increase minimum required numpy version to 1.21.0, scipy to 1.3.3, koalas to 1.8.1 (#1833)
      • Remove pyyaml as a requirement (#1833)
    • Documentation Changes
      • Remove testing on conda forge in release.md (#1811)
    • Testing Changes
      • Enable auto-merge for minimum and latest dependency merge requests (#1818, #1821, #1822)
      • Change auto approve workfow to use PR number and run every 30 minutes (#1827)
      • Add auto approve workflow to run when unit tests complete (#1837)
      • Test deserializing from S3 with mocked S3 fixtures only (#1825)
      • Remove fastparquet as a test requirement (#1833)

    Thanks to the following people for contributing to this release: @davesque, @gsheni, @rwedge, @thehomebrewnerd

    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(Dec 2, 2021)

    • Enhancements
      • Add NumericLag transform primitive #1797
    • Changes
      • Update pip to 21.3.1 for test requirements #1789
    • Documentation Changes
      • Add Docker install instructions and documentation on the install page. #1785
      • Update install page on documentation with correct python version #1784
      • Fix formatting in Improving Computational Performance guide #1786

    Thanks to the following people for contributing to this release: @gsheni, @HenryRocha, @tamargrey, @thehomebrewnerd

    Source code(tar.gz)
    Source code(zip)
  • v1.3.0.dev0(Dec 2, 2021)

  • v1.2.0(Nov 15, 2021)

    • Enhancements
      • Add Rolling Transform primitives with integer parameters (#1770)
    • Fixes
      • Handle new graphviz FORMATS import (#1770)
    • Changes
      • Add new version of featuretools_tsfresh_primitives as an add-on library (#1772)
      • Add load_weather as demo dataset for time series (#1777)

    Thanks to the following people for contributing to this release: @gsheni, @tamargrey

    Source code(tar.gz)
    Source code(zip)
  • v1.2.0.dev0(Nov 15, 2021)

  • v1.1.0(Nov 2, 2021)

    v1.1.0 Nov 2, 2021

    • Fixes
      • Check base_of_exclude attribute on primitive instead feature class (#1749)
      • Pin upper bound for pyspark (#1748)
      • Fix get_unused_primitives only recognizes lowercase primitive strings (#1733)
      • Require newer versions of dask and distributed (#1762)
      • Fix bug with pass-through columns of cutoff_time df when n_jobs > 1 (#1765)
    • Changes
      • Add new version of nlp_primitives as an add-on library (#1743)
      • Change name of date_of_birth (column name) to birthday in mock dataset (#1754)
    • Documentation Changes
      • Upgrade Sphinx and fix docs configuration error (#1760)
    • Testing Changes
      • Modify CI to run unit test with latest dependencies on python 3.9 (#1738)
      • Added Python version standardizer to Jupyter notebook linting (#1741)

    Thanks to the following people for contributing to this release: @bchen1116, @gsheni, @HenryRocha, @jeff-hernandez, @ridicolos, @rwedge

    Source code(tar.gz)
    Source code(zip)
  • v1.1.0.dev0(Nov 1, 2021)

  • v1.0.0(Oct 12, 2021)

    v1.0.0 Oct 12, 2021

    • Enhancements
      • Add support for creating EntitySets from Woodwork DataTables (#1277)
      • Add EntitySet.__deepcopy__ that retains Woodwork typing information (#1465)
      • Add EntitySet.__getstate__ and EntitySet.__setstate__ to preserve typing when pickling (#1581)
      • Returned feature matrix has woodwork typing information (#1664)
    • Fixes
      • Fix DFSTransformer Documentation for Featuretools 1.0 (#1605)
      • Fix calculate_feature_matrix time type check and encode_features for synthesis tests (#1580)
      • Revert reordering of categories in Equal and NotEqual primitives (#1640)
      • Fix bug in EntitySet.add_relationship that caused foreign_key tag to be lost (#1675)
      • Update DFS to not build features on last time index columns in dataframes (#1695)
    • Changes
      • Remove add_interesting_values from Entity (#1269)
      • Move set_secondary_time_index method from Entity to EntitySet (#1280)
      • Refactor Relationship creation process (#1370)
      • Replaced Entity.update_data with EntitySet.update_dataframe (#1398)
      • Move validation check for uniform time index to EntitySet (#1400)
      • Replace Entity objects in EntitySet with Woodwork dataframes (#1405)
      • Refactor EntitySet.plot to work with Woodwork dataframes (#1468)
      • Move last_time_index to be a column on the DataFrame (#1456)
      • Update serialization/deserialization to work with Woodwork (#1452)
      • Refactor EntitySet.query_by_values to work with Woodwork dataframes (#1467)
      • Replace list_variable_types with list_logical_types (#1477)
      • Allow deep EntitySet equality check (#1480)
      • Update EntitySet.concat to work with Woodwork DataFrames (#1490)
      • Add function to list semantic tags (#1486)
      • Initialize Woodwork on feature matrix in remove_highly_correlated_features if necessary (#1618)
      • Remove categorical-encoding as an add-on library (will be added back later) (#1632)
      • Remove autonormalize as an add-on library (will be added back later) (#1636)
      • Remove tsfresh, nlp_primitives, sklearn_transformer as an add-on library (will be added back later) (#1638)
      • Update input and return types for CumCount primitive (#1651)
      • Standardize imports of Woodwork (#1526)
      • Rename target entity to target dataframe (#1506)
      • Replace entity_from_dataframe with add_dataframe (#1504)
      • Create features from Woodwork columns (#1582)
      • Move default variable description logic to generate_description (#1403)
      • Update Woodwork to version 0.4.0 with LogicalType.transform and LogicalType instances (#1451)
      • Update Woodwork to version 0.4.1 with Ordinal order values and whitespace serialization fix (#1478)
      • Use ColumnSchema for primitive input and return types (#1411)
      • Update features to use Woodwork and remove Entity and Variable classes (#1501)
      • Re-add make_index functionality to EntitySet (#1507)
      • Use ColumnSchema in DFS primitive matching (#1523)
      • Updates from Featuretools v0.26.0 (#1539)
      • Leverage Woodwork better in add_interesting_values (#1550)
      • Update calculate_feature_matrix to use Woodwork (#1533)
      • Update Woodwork to version 0.6.0 with changed categorical inference (#1597)
      • Update nlp-primitives requirement for Featuretools 1.0 (#1609)
      • Remove remaining references to Entity and Variable in code (#1612)
      • Update Woodwork to version 0.7.1 with changed initialization (#1648)
      • Removes outdated workaround code related to a since-resolved pandas issue (#1677)
      • Remove unused _dataframes_equal and camel_to_snake functions (#1683)
      • Update Woodwork to version 0.8.0 for improved performance (#1689)
      • Remove redundant typecasting in encode_features (#1694)
      • Speed up encode_features if not inplace, some space cost (#1699)
      • Clean up comments and commented out code (#1701)
      • Update Woodwork to version 0.8.1 for improved performance (#1702)
    • Documentation Changes
      • Add a Woodwork Typing in Featuretools guide (#1589)
      • Add a resource guide for transitioning to Featuretools 1.0 (#1627)
      • Update using_entitysets page to use Woodwork (#1532)
      • Update FAQ page to use Woodwork integration (#1649)
      • Update DFS page to be Jupyter notebook and use Woodwork integration (#1557)
      • Update Feature Primitives page to be Jupyter notebook and use Woodwork integration (#1556)
      • Update Handling Time page to be Jupyter notebook and use Woodwork integration (#1552)
      • Update Advanced Custom Primitives page to be Jupyter notebook and use Woodwork integration (#1587)
      • Update Deployment page to use Woodwork integration (#1588)
      • Update Using Dask EntitySets page to be Jupyter notebook and use Woodwork integration (#1590)
      • Update Specifying Primitive Options page to be Jupyter notebook and use Woodwork integration (#1593)
      • Update API Reference to match Featuretools 1.0 API (#1600)
      • Update Index page to be Jupyter notebook and use Woodwork integration (#1602)
      • Update Feature Descriptions page to be Jupyter notebook and use Woodwork integration (#1603)
      • Update Using Koalas EntitySets page to be Jupyter notebook and use Woodwork integration (#1604)
      • Update Glossary to use Woodwork integration (#1608)
      • Update Tuning DFS page to be Jupyter notebook and use Woodwork integration (#1610)
      • Fix small formatting issues in Documentation (#1607)
      • Remove Variables page and more references to variables (#1629)
      • Update Feature Selection page to use Woodwork integration (#1618)
      • Update Improving Performance page to be Jupyter notebook and use Woodwork integration (#1591)
      • Fix typos in transition guide (#1672)
      • Update installation instructions for 1.0.0rc1 announcement in docs (#1707, :pr:1708, :pr:1713, :pr:1716)
      • Fixed broken link for Demo notebook in README.md (#1728)
      • Update contributing.md to improve instructions for external contributors (#1723)
      • Manually revert changes made by :pr:1677 and :pr:1679. The related bug in pandas still exists. (#1731)
    • Testing Changes
      • Remove entity tests (#1521)
      • Fix broken EntitySet tests (#1548)
      • Fix broken primitive tests (#1568)
      • Added Jupyter notebook cleaner to the linters (#1719)
      • Update reviewers for minimum and latest dependency checkers (#1715)
      • Full coverage for EntitySet.eq method (#1725)
      • Add tests to verify all primitives can be initialized without parameter values (#1726)

    Thanks to the following people for contributing to this release: @bchen1116, @gsheni , @HenryRocha, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd, @VaishnaviNandakumar

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0.dev2(Oct 12, 2021)

  • v1.0.0rc1.dev0(Sep 17, 2021)

  • v1.0.0rc1(Sep 17, 2021)

    v1.0.0rc1 Sep 17, 2021

    Release candidate for version 1.0

    What's New in this Release

    For additional documentation, check out the 1.0 transition guide

    Adding Interesting Values

    To add interesting values for a single entity, call EntitySet.add_interesting_values passing the id of the entity for which interesting values should be added.

    >>> es.add_interesting_values(entity_id='log')
    

    Setting a Secondary Time Index

    To set a secondary time index for a specific entity, call EntitySet.set_secondary_time_index passing Entity for which to set the secondary time index along with the dictionary mapping the secondary time index column to the for which the secondary time index applies.

    >>> customers_secondary_time_index = {'cancel_date': ['cancel_reason']}
    >>> es.set_secondary_time_index(es['customers'], customers_secondary_time_index)
    

    Creating a Relationship and Adding to an EntitySet

    Relationships are now created by passing parameters identifying the entityset along with four string values specifying the parent dataframe, parent column, child dataframe and child column. Specifying parameter names is optional.

    >>> new_relationship = Relationship(
    ...     entityset=es,
    ...     parent_dataframe_name='customers',
    ...     parent_column_name='id',
    ...     child_dataframe_name='sessions',
    ...     child_column_name='customer_id'
    ... )
    

    Relationships can now be added to EntitySets in one of two ways. The first approach is to pass in name values for the parent dataframe, parent column, child dataframe and child column. Specifying parameter names is optional with this approach.

    >>> es.add_relationship(
    ...     parent_dataframe_name='customers',
    ...     parent_column_name='id',
    ...     child_dataframe_name='sessions',
    ...     child_column_name='customer_id'
    ... )
    

    Relationships can also be added by passing in a previously created Relationship object. When using this approach the relationship parameter name must be included.

    >>> es.add_relationship(relationship=new_relationship)
    

    Replace DataFrame

    To replace a dataframe in an EntitySet with a new dataframe, call EntitySet.replace_dataframe and pass in the name of the dataframe to replace along with the new data.

    >>> es.replace_dataframe(dataframe_name='log', df=df)
    

    List Logical Types and Semantic Tags

    Logical types and semantic tags have replaced variable types to parse and interpret columns. You can list all the available logical types by calling featuretools.list_logical_types.

    >>> ft.list_logical_types()
    

    You can list all the available semantic tags by calling featuretools.list_semantic_tags.

    >>> ft.list_semantic_tags()
    

    Breaking Changes

    • Entity.add_interesting_values has been removed. To add interesting values for a single entity, call EntitySet.add_interesting_values and pass the name of the dataframe for which to add interesting values in the dataframe_name parameter (#1405, #1370).
    • Entity.set_secondary_time_index has been removed and replaced by EntitySet.set_secondary_time_index with an added dataframe_name parameter to specify the dataframe on which to set the secondary time index (#1405, #1370).
    • Relationship initialization has been updated to accept four name values for the parent dataframe, parent column, child dataframe and child column instead of accepting two Variable objects (#1405, #1370).
    • EntitySet.add_relationship has been updated to accept dataframe and column name values or a Relationship object. Adding a relationship from a Relationship object now requires passing the relationship as a keyword argument (#1405, #1370).
    • Entity.update_data has been removed. To update the dataframe, call EntitySet.replace_dataframe and use the dataframe_name parameter (#1630, #1522).
    • The data in an EntitySet is no longer stored in Entity objects. Instead, dataframes with Woodwork typing information are used. Accordingly, most language referring to “entities” will now refer to “dataframes”, references to “variables” will now refer to “columns”, and “variable types” will use the Woodwork type system’s “logical types” and “semantic tags” (#1405).
    • The dictionary of tuples passed to EntitySet.__init__ has replaced the variable_types element with separate logical_types and semantic_tags dictionaries (#1405).
    • EntitySet.entity_from_dataframe no longer exists. To add new tables to an entityset, useEntitySet.add_dataframe (#1405).
    • EntitySet.normalize_entity has been renamed to EntitySet.normalize_dataframe (#1405).
    • Instead of raising an error at EntitySet.add_relationship when the dtypes of parent and child columns do not match, Featuretools will now check whether the Woodwork logical type of the parent and child columns match. If they do not match, there will now be a warning raised, and Featuretools will attempt to update the logical type of the child column to match the parent’s (#1405).
    • If no index is specified at EntitySet.add_dataframe, the first column will only be used as index if Woodwork has not been initialized on the DataFrame. When adding a dataframe that already has Woodwork initialized, if there is no index set, an error will be raised (#1405).
    • Featuretools will no longer re-order columns in DataFrames so that the index column is the first column of the DataFrame (#1405).
    • Type inference can now be performed on Dask and Koalas dataframes, though a warning will be issued indicating that this may be computationally intensive (#1405).
    • EntitySet.time_type is no longer stored as Variable objects. Instead, Woodwork typing is used, and a numeric time type will be indicated by the 'numeric' semantic tag string, and a datetime time type will be indicated by the Datetime logical type (#1405).
    • last_time_index, secondary_time_index, and interesting_values are no longer attributes of an entityset’s tables that can be accessed directly. Now they must be accessed through the metadata of the Woodwork DataFrame, which is a dictionary (#1405).
    • The helper function list_variable_types will be removed in a future release and replaced by list_logical_types. In the meantime, list_variable_types will return the same output as list_logical_types (#1447).

    Changelog

    • Enhancements
      • Add support for creating EntitySets from Woodwork DataTables (#1277)
      • Add EntitySet.__deepcopy__ that retains Woodwork typing information (#1465)
      • Add EntitySet.__getstate__ and EntitySet.__setstate__ to preserve typing when pickling (#1581)
      • Returned feature matrix has woodwork typing information (#1664)
    • Fixes
      • Fix DFSTransformer Documentation for Featuretools 1.0 (#1605)
      • Fix calculate_feature_matrix time type check and encode_features for synthesis tests (#1580)
      • Revert reordering of categories in Equal and NotEqual primitives (#1640)
      • Fix bug in EntitySet.add_relationship that caused foreign_key tag to be lost (#1675)
      • Update DFS to not build features on last time index columns in dataframes (#1695)
    • Changes
      • Remove add_interesting_values from Entity (#1269)
      • Move set_secondary_time_index method from Entity to EntitySet (#1280)
      • Refactor Relationship creation process (#1370)
      • Replaced Entity.update_data with EntitySet.update_dataframe (#1398)
      • Move validation check for uniform time index to EntitySet (#1400)
      • Replace Entity objects in EntitySet with Woodwork dataframes (#1405)
      • Refactor EntitySet.plot to work with Woodwork dataframes (#1468)
      • Move last_time_index to be a column on the DataFrame (#1456)
      • Update serialization/deserialization to work with Woodwork (#1452)
      • Refactor EntitySet.query_by_values to work with Woodwork dataframes (#1467)
      • Replace list_variable_types with list_logical_types (#1477)
      • Allow deep EntitySet equality check (#1480)
      • Update EntitySet.concat to work with Woodwork DataFrames (#1490)
      • Add function to list semantic tags (#1486)
      • Initialize Woodwork on feature matrix in remove_highly_correlated_features if necessary (#1618)
      • Remove categorical-encoding as an add-on library (will be added back later) (#1632)
      • Remove autonormalize as an add-on library (will be added back later) (#1636)
      • Remove tsfresh, nlp_primitives, sklearn_transformer as an add-on library (will be added back later) (#1638)
      • Update input and return types for CumCount primitive (#1651)
      • Standardize imports of Woodwork (#1526)
      • Rename target entity to target dataframe (#1506)
      • Replace entity_from_dataframe with add_dataframe (#1504)
      • Create features from Woodwork columns (#1582)
      • Move default variable description logic to generate_description (#1403)
      • Update Woodwork to version 0.4.0 with LogicalType.transform and LogicalType instances (#1451)
      • Update Woodwork to version 0.4.1 with Ordinal order values and whitespace serialization fix (#1478)
      • Use ColumnSchema for primitive input and return tyes (#1411)
      • Update features to use Woodwork and remove Entity and Variable classes (#1501)
      • Re-add make_index functionality to EntitySet (#1507)
      • Use ColumnSchema in DFS primitive matching (#1523)
      • Updates from Featuretools v0.26.0 (#1539)
      • Leverage Woodwork better in add_interesting_values (#1550)
      • Update calculate_feature_matrix to use Woodwork (#1533)
      • Update Woodwork to version 0.6.0 with changed categorical inference (#1597)
      • Update nlp-primitives requirement for Featuretools 1.0 (#1609)
      • Remove remaining references to Entity and Variable in code (#1612)
      • Update Woodwork to version 0.7.1 with changed initialization (#1648)
      • Removes outdated workaround code related to a since-resolved pandas issue (#1677)
      • Remove unused _dataframes_equal and camel_to_snake functions (#1683)
      • Update Woodwork to version 0.8.0 for improved performance (#1689)
      • Remove redundant typecasting in encode_features (#1694)
      • Speed up encode_features if not inplace, some space cost (#1699)
      • Clean up comments and commented out code (#1701)
      • Update Woodwork to version 0.8.1 for improved performance (#1702)
    • Documentation Changes
      • Add a Woodwork Typing in Featuretools guide (#1589)
      • Add a resource guide for transitioning to Featuretools 1.0 (#1627)
      • Update using_entitysets page to use Woodwork (#1532)
      • Update FAQ page to use Woodwork integration (#1649)
      • Update DFS page to be Jupyter notebook and use Woodwork integration (#1557)
      • Update Feature Primitives page to be Jupyter notebook and use Woodwork integration (#1556)
      • Update Handling Time page to be Jupyter notebook and use Woodwork integration (#1552)
      • Update Advanced Custom Primitives page to be Jupyter notebook and use Woodwork integration (#1587)
      • Update Deployment page to use Woodwork integration (#1588)
      • Update Using Dask EntitySets page to be Jupyter notebook and use Woodwork integration (#1590)
      • Update Specifying Primitive Options page to be Jupyter notebook and use Woodwork integration (#1593)
      • Update API Reference to match Featuretools 1.0 API (#1600)
      • Update Index page to be Jupyter notebook and use Woodwork integration (#1602)
      • Update Feature Descriptions page to be Jupyter notebook and use Woodwork integration (#1603)
      • Update Using Koalas EntitySets page to be Jupyter notebook and use Woodwork integration (#1604)
      • Update Glossary to use Woodwork integration (#1608)
      • Update Tuning DFS page to be Jupyter notebook and use Woodwork integration (#1610)
      • Fix small formatting issues in Documentation (#1607)
      • Remove Variables page and more references to variables (#1629)
      • Update Feature Selection page to use Woodwork integration (#1618)
      • Update Improving Performance page to be Jupyter notebook and use Woodwork integration (#1591)
      • Fix typos in transition guide (#1672)
    • Testing Changes
      • Remove entity tests (#1521)
      • Fix broken EntitySet tests (#1548)
      • Fix broken primitive tests (#1568)

    Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd

    Source code(tar.gz)
    Source code(zip)
  • v0.27.1(Sep 2, 2021)

    v0.27.1 Sep 2, 2021

    • Documentation Changes
      • Add banner to docs about upcoming Featuretools 1.0 release (#1669)

    Thanks to the following people for contributing to this release: @thehomebrewnerd

    Source code(tar.gz)
    Source code(zip)
  • v0.27.0(Aug 31, 2021)

    v0.27.0 Aug 31, 2021

    • Changes

      • Remove autonormalize, tsfresh, nlp_primitives, sklearn_transformer, caegorical_encoding as an add-on libraries (will be added back later) (#1644)
      • Emit a warning message when a featuretools_primitives entrypoint throws an exception (#1662)
      • Throw a RuntimeError when two primitives with the same name are encountered during featuretools_primitives entrypoint handling (#1662)
      • Prevent the featuretools_primitives entrypoint loader from loading non-class objects as well as the AggregationPrimitive and TransformPrimitive base classes (#1662)
    • Testing Changes

      • Update latest dependency checker with proper install command (#1652)
      • Update isort dependency (#1654)

      Thanks to the following people for contributing to this release: @davesque, @gsheni, @jeff-hernandez, @rwedge

    Source code(tar.gz)
    Source code(zip)
  • v0.27.0.dev0(Aug 31, 2021)

  • v0.26.2(Aug 17, 2021)

    • Documentation Changes * Specify conda channel and Windows exe in graphviz installation instructions (#1611) * Remove GA token from the layout html (#1622)
    • Testing Changes * Add additional reviewers to minimum and latest dependency checkers (#1558, #1562, #1564, #1567)

    Thanks to the following people for contributing to this release: @gsheni, @simha104

    Source code(tar.gz)
    Source code(zip)
  • v0.26.2.dev0(Aug 17, 2021)

  • v0.26.1(Jul 23, 2021)

    v0.26.1 Jul 23, 2021

    • Fixes
      • Set name attribute for EmailAddressToDomain primitive (#1543)
    • Documentation Changes
      • Remove and ignore unnecessary graph files (#1544)

    Thanks to the following people for contributing to this release: @davesque, @rwedge

    Source code(tar.gz)
    Source code(zip)
  • v0.26.1.dev0(Jul 23, 2021)

  • v0.26.0(Jul 15, 2021)

    v0.26.0 Jul 15, 2021

    • Enhancements
      • Add replace_inf_values utility function for replacing inf values in a feature matrix (#1505)
      • Add URLToProtocol, URLToDomain, URLToTLD, EmailAddressToDomain, IsFreeEmailDomain as transform primitives (#1508, #1531)
    • Fixes
      • include_entities correctly overrides exclude_entities in primitive_options (#1518)
    • Documentation Changes
      • Prevent logging on build (#1498)
    • Testing Changes
      • Test featuretools on pandas 1.3.0 release candidate and make fixes (#1492) Thanks to the following people for contributing to this release: @frances-h, @gsheni, @rwedge, @tamargrey, @thehomebrewnerd, @tuethan1999
    Source code(tar.gz)
    Source code(zip)
Owner
alteryx
Alteryx Development
alteryx
open-source feature selection repository in python

scikit-feature Feature selection repository scikit-feature in Python. scikit-feature is an open-source feature selection repository in Python develope

Jundong Li 1.2k Jun 13, 2022
A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.

Master status: Development status: Package information: MDR A scikit-learn-compatible Python implementation of Multifactor Dimensionality Reduction (M

Epistasis Lab at UPenn 121 May 17, 2022
Python implementations of the Boruta all-relevant feature selection method.

boruta_py This project hosts Python implementations of the Boruta all-relevant feature selection method. Related blog post How to install Install with

null 1.1k Jun 9, 2022
A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.

Master status: Development status: Package information: scikit-rebate This package includes a scikit-learn-compatible Python implementation of ReBATE,

Epistasis Lab at UPenn 359 Jun 7, 2022
A fast xgboost feature selection algorithm

BoostARoota A Fast XGBoost Feature Selection Algorithm (plus other sklearn tree-based classifiers) Why Create Another Algorithm? Automated processes l

Chase DeHan 181 Apr 11, 2022
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

NVIDIA Merlin NVIDIA Merlin is an open source library designed to accelerate recommender systems on NVIDIA’s GPUs. It enables data scientists, machine

null 252 Jun 10, 2022
Feature engineering library that helps you keep track of feature dependencies, documentation and schema

Feature engineering library that helps you keep track of feature dependencies, documentation and schema

null 28 May 31, 2022
Automated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning

The mljar-supervised is an Automated Machine Learning Python package that works with tabular data. I

MLJAR 1.9k May 30, 2022
Automates Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning :rocket:

MLJAR Automated Machine Learning Documentation: https://supervised.mljar.com/ Source Code: https://github.com/mljar/mljar-supervised Table of Contents

MLJAR 1.9k Jun 8, 2022
a feature engineering wrapper for sklearn

Few Few is a Feature Engineering Wrapper for scikit-learn. Few looks for a set of feature transformations that work best with a specified machine lear

William La Cava 45 Jan 10, 2022
Intelligent Trading Bot: Automatically generating signals and trading based on machine learning and feature engineering

Intelligent Trading Bot: Automatically generating signals and trading based on machine learning and feature engineering

Alexandr Savinov 180 Jun 5, 2022
A data preprocessing and feature engineering script for a machine learning pipeline is prepared.

FEATURE ENGINEERING Business Problem: A data preprocessing and feature engineering script for a machine learning pipeline needs to be prepared. It is

Pinar Oner 7 Dec 18, 2021
Houseprices - Predict sales prices and practice feature engineering, RFs, and gradient boosting

House Prices - Advanced Regression Techniques Predicting House Prices with Machine Learning This project is build to enhance my knowledge about machin

null 1 Jan 1, 2022
House_prices_kaggle - Predict sales prices and practice feature engineering, RFs, and gradient boosting

House Prices - Advanced Regression Techniques Predicting House Prices with Machine Learning This project is build to enhance my knowledge about machin

Gurpreet Singh 1 Jan 1, 2022
Diabet Feature Engineering - Predict whether people have diabetes when their characteristics are specified

Diabet Feature Engineering - Predict whether people have diabetes when their characteristics are specified

Şebnem 6 Jan 18, 2022
Özlem Taşkın 0 Feb 23, 2022
PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集,包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。 人机交互 主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

null 161 Jun 7, 2022
zoofs is a Python library for performing feature selection using an variety of nature inspired wrapper algorithms. The algorithms range from swarm-intelligence to physics based to Evolutionary. It's easy to use ,flexible and powerful tool to reduce your feature size.

zoofs is a Python library for performing feature selection using a variety of nature-inspired wrapper algorithms. The algorithms range from swarm-intelligence to physics-based to Evolutionary. It's easy to use , flexible and powerful tool to reduce your feature size.

Jaswinder Singh 141 May 17, 2022
LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

LogDeep is an open source deeplearning-based log analysis toolkit for automated anomaly detection.

donglee 256 Jun 8, 2022
Minimal example of how to use pytest with automated 'devops' style automated test runs

Pytest python example with automated testing This is a minimal viable example of pytest with an automated run of tests for every push/merge into the m

Karma Computing 2 Jan 2, 2022
Sunflower-farmers-automated-bot - Sunflower Farmers NFT Game automated bot.IT IS NOT a cheat or hack bot

Sunflower-farmers-auto-bot Sunflower Farmers NFT Game automated bot.IT IS NOT a

Arthur Alves 16 May 7, 2022
open-source feature selection repository in python

scikit-feature Feature selection repository scikit-feature in Python. scikit-feature is an open-source feature selection repository in Python develope

Jundong Li 1.2k Jun 13, 2022
A comprehensive, feature-rich, open source, and portable, collection of Solitaire games.

PySol Fan Club edition This is an open source and portable (Windows, Linux and Mac OS X) collection of Card Solitaire/Patience games written in Python

Shlomi Fish 340 Jun 11, 2022
A 3D structural engineering finite element library for Python.

An easy to use elastic 3D structural engineering finite element analysis library for Python.

Craig 188 May 30, 2022
pygame is a Free and Open Source python programming language library for making multimedia applications like games built on top of the excellent SDL library. C, Python, Native, OpenGL.

pygame is a Free and Open Source python programming language library for making multimedia applications like games built on top of the excellent SDL library. C, Python, Native, OpenGL.

pygame 4.9k Jun 13, 2022
edaSQL is a library to link SQL to Exploratory Data Analysis and further more in the Data Engineering.

edaSQL is a python library to bridge the SQL with Exploratory Data Analysis where you can connect to the Database and insert the queries. The query results can be passed to the EDA tool which can give greater insights to the user.

Tamil Selvan 7 May 19, 2022
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

null 20.8k Jun 9, 2022