TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

Last update: Jan 01, 2023

Overview

TensorFlow Decision Forests

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models. The library is a collection of Keras models and supports classification, regression and ranking.

TF-DF is a TensorFlow wrapper around the Yggdrasil Decision Forests C++ libraries. Models trained with TF-DF are compatible with Yggdrasil Decision Forests' models, and vice versa.

Usage example

A minimal end-to-end run looks as follow:

import tensorflow_decision_forests as tfdf
import pandas as pd

# Load the dataset in a Pandas dataframe.
train_df = pd.read_csv("project/train.csv")
test_df = pd.read_csv("project/test.csv")

# Convert the dataset into a TensorFlow dataset.
train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="my_label")
test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_df, label="my_label")

# Train the model
model = tfdf.keras.RandomForestModel()
model.fit(train_ds)

# Look at the model.
model.summary()

# Evaluate the model.
model.evaluate(test_ds)

# Export to a TensorFlow SavedModel.
# Note: the model is compatible with Yggdrasil Decision Forests.
model.save("project/model")

Documentation & Resources

The following resources are available:

TF-DF on TensorFlow.org (with the API Reference and Tutorials)
Colabs
Migration guide from Neural Network to Decision Forests
Issue tracker
Known issues
Changelog
Discuss on TensorFlow.Org
Yggdrasil documentation (for advanced users and C++ serving)
Tutorials

Installation

To install TensorFlow Decision Forests, run:

pip3 install tensorflow_decision_forests --upgrade

See the installation page for more details, troubleshooting and alternative installation solutions.

Contributing

Contributions to TensorFlow Decision Forests and Yggdrasil Decision Forests are welcome. If you want to contribute, make sure to review the developer manual and contribution guidelines.

Credits

TensorFlow Decision Forests was developed by:

Mathieu Guillame-Bert (gbm AT google DOT com)
Jan Pfeifer (janpf AT google DOT com)
Sebastian Bruch (sebastian AT bruch DOT io)
Arvind Srinivasan (arvnd AT google DOT com)

License

Apache License 2.0

Comments

pip install does not work on Mac

Hey there,

First of all, congratulations for your effort, this is a great initiative!

I am raising this issue because I have faced a problem with installation. I have created a Python 3.8.6 virtual environment on my Mac and installed tensorflow 2.5.0 successfully. When I ran the installation command for the "Tensorflow Decision Forests" package, pip3 install tensorflow_decision_forests --upgrade

I got:

ERROR: Could not find a version that satisfies the requirement tensorflow_decision_forests (from versions: none) ERROR: No matching distribution found for tensorflow_decision_forests

It's a bit confusing because the installation command on PyPi (I guess this is the right one) contains dashes ,instead of underscores, in the package name.

Any ideas?

Thanks a lot

opened by erwtokritos 37

Getting error at end of training: AbstractFeatureResourceE does not exist. [Op:SimpleMLModelTrainer]

I am getting the following error when I try a simple model.

csv_feature_columns =  ['weekday_weekend'] + weather_columns + building_columns + schedules_columns + encoded_time_columns + ["total_site_electricity_kwh"] 

train_df = pd.read_csv(timeseries_file_path,usecols=csv_feature_columns,nrows=10000)

train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_df, label="total_site_electricity_kwh")

model = tfdf.keras.RandomForestModel()
model.fit(train_ds)

157/157 [==============================] - 6s 18ms/step
---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
<ipython-input-6-ce1e05e4d2c8> in <module>
      1 # Train a Random Forest model.
      2 model = tfdf.keras.RandomForestModel()
----> 3 model.fit(train_ds)
      4 

~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py in fit(self, x, y, callbacks, **kwargs)
    743 
    744     history = super(CoreModel, self).fit(
--> 745         x=x, y=y, epochs=1, callbacks=callbacks, **kwargs)
    746 
    747     self._build(x)

~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   1227           epoch_logs.update(val_logs)
   1228 
-> 1229         callbacks.on_epoch_end(epoch, epoch_logs)
   1230         training_logs = epoch_logs
   1231         if self.stop_training:

~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow/python/keras/callbacks.py in on_epoch_end(self, epoch, logs)
    433     logs = self._process_logs(logs)
    434     for callback in self.callbacks:
--> 435       callback.on_epoch_end(epoch, logs)
    436 
    437   def on_train_batch_begin(self, batch, logs=None):

~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py in on_epoch_end(***failed resolving arguments***)
    930     del logs
    931     if epoch == 0:
--> 932       self._model._train_model()  # pylint:disable=protected-access
    933 
    934 

~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py in _train_model(self)
    864         guide=guide,
    865         training_config=self._advanced_arguments.yggdrasil_training_config,
--> 866         deployment_config=self._advanced_arguments.yggdrasil_deployment_config,
    867     )
    868 

~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow_decision_forests/tensorflow/core.py in train(input_ids, label_id, model_id, learner, task, generic_hparms, ranking_group, training_config, deployment_config, guide, model_dir, keep_model_in_resource)
    503       training_config=training_config.SerializeToString(),
    504       deployment_config=deployment_config.SerializeToString(),
--> 505       guide=guide.SerializeToString())
    506 
    507 

~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow/python/util/tf_export.py in wrapper(*args, **kwargs)
    402           'Please pass these args as kwargs instead.'
    403           .format(f=f.__name__, kwargs=f_argspec.args))
--> 404     return f(**kwargs)
    405 
    406   return tf_decorator.make_decorator(f, wrapper, decorator_argspec=f_argspec)

~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow_decision_forests/tensorflow/ops/training/op.py in simple_ml_model_trainer(feature_ids, label_id, weight_id, model_id, model_dir, learner, hparams, task, training_config, deployment_config, guide, name)
    510       return _result
    511     except _core._NotOkStatusException as e:
--> 512       _ops.raise_from_not_ok_status(e, name)
    513     except _core._FallbackException:
    514       pass

~/.conda/envs/tensorflow25/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e, name)
   6895   message = e.message + (" name: " + name if name is not None else "")
   6896   # pylint: disable=protected-access
-> 6897   six.raise_from(core._status_to_exception(e.code, message), None)
   6898   # pylint: enable=protected-access
   6899 

~/.conda/envs/tensorflow25/lib/python3.7/site-packages/six.py in raise_from(value, from_value)

NotFoundError: Resource decision_forests/ 12-in/N27tensorflow_decision_forests3ops23AbstractFeatureResourceE does not exist. [Op:SimpleMLModelTrainer]

opened by sibyjackgrove 16

AssertionError: Exception encountered when calling layer "gradient_boosted_trees_model" (type GradientBoostedTreesModel).

When trying to get a prediction I am getting the error in the title, it also gives the following:

in user code:

    File "/home/laner107/.local/lib/python3.8/site-packages/tensorflow_decision_forests/keras/core.py", line 791, in call  *
        normalized_inputs = self._build_normalized_inputs(inputs)
    File "/home/laner107/.local/lib/python3.8/site-packages/tensorflow_decision_forests/keras/core.py", line 747, in _build_normalized_inputs  *
        assert len(self._semantics) == 1

    AssertionError: 


Call arguments received:
  • inputs=tf.Tensor(shape=(14,), dtype=float32)
  • training=False

The following is where I call the prediction:

  def predict_to_data(self):
        testing_data = test_preprocess()
        testing_data = np.array(testing_data)
        predicitions = self.model(testing_data[0])

here is what the data im passing in looks like:

[0.484375   0.83007665 0.56508876 0.46099291 0.52793453 0.75438596
 0.52066116 0.7826087  0.         0.65852121 0.40425532 0.58974359
 0.69047619 0.37058824]

and here is the architecture for the model:

def create_single_model(self):
        input_features = tf.keras.Input(shape=(self.num_features,))

        # bootstrap_size_ratio: Number of examples used to train each trees; expressed as a ratio of the training dataset size. Default: 1.0.
        rf_model_1 = tfdf.keras.GradientBoostedTreesModel(
            verbose=0,
            task=tfdf.keras.Task.CLASSIFICATION,
            hyperparameter_template="[email protected]",
            num_trees=self.num_of_trees,
        )

        model = tf.keras.models.Model(input_features, rf_model_1(input_features))

        return model

For now I was just going to generate one prediction to see what the output was, eventually though I plan on doing it in batches just have to figure that out, any idea why this error is occurring?

All of this is done after the GradientBoostedTreesModel is trained(fit) and evaluated using validation data.

opened by laneciar 15

Cannot serve using precompiled tf-serving. Error: `GLIBC_2.33' not found

Hi!

I am trying to serve the test tf-df example decision-forests/examples/minimal.py (aptly named tf-df-example below) using the precompiled tf-serving. Unfortunately, I am running into numerous GLIBC_X.XX not found errors like this:

/usr/bin/tensorflow_model_server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /usr/bin/tensorflow_model_server)

Best guess is that the issue has to do with the difference between the system that precompiled tensorflow_model_server_linux.zip and what I am in pulling as the base: tensorflow/serving:2.9.1. I have tried others, but ultimately this same GLIBC_X.XX not found error pops up and I'm not sure where to go from here (aside from trying to compile myself).

Any thoughts or suggestions would be greatly appreciated. Thanks!

Here is the set up.

Dockerfile:

FROM tensorflow/serving:2.9.1 as base

# Install curl
RUN apt-get update && apt-get install -y --no-install-recommends \
    curl unzip \
    && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN curl -LJO "https://github.com/tensorflow/decision-forests/releases/download/serving-0.2.6/tensorflow_model_server_linux.zip"
RUN unzip -o tensorflow_model_server_linux.zip -d /usr/bin/

COPY tf-df_serving_entrypoint.sh /usr/bin/tf-df_serving_entrypoint.sh
COPY /models/tf-df-example/ /tensorflow/models/tf-df-example/

WORKDIR /tensorflow/
ENTRYPOINT ["/usr/bin/tf-df_serving_entrypoint.sh"]

tf-df_serving_entrypoint.sh:

# Using prebuild binary set in dockerfile
TFSERVING="/usr/bin/tensorflow_model_server"

# Configure the model path and name.
MODEL_PATH=/tensorflow/models/tf-df-example/
MODEL_NAME=tf-df-example

# Start a TF Serving server
${TFSERVING} \
    --rest_api_port=8501 \
    --model_name=${MODEL_NAME} \
    --model_base_path=${MODEL_PATH}

To build and run:

docker build . -t tfdf/serving
docker run -t --rm -p 8501:8501 tfdf/serving

bug

opened by SpenceLunderman 13

INVALID_ARGUMENT: No defined default loss for this combination of label type and task

I'm trying to use GradientBoostedTreesModel in a TFX pipeline, the code is roughly as follows:

model = tfdf.keras.GradientBoostedTreesModel(
        task=tfdf.keras.Task.CLASSIFICATION,
        num_trees=200,
        max_depth=6,
        verbose=True,
        hyperparameter_template="better_default",
        name="classifier",
    )
model.compile(metrics=[tf.keras.metrics.AUC(), "accuracy"])
model.fit(_input_fn(fn_args.train_files, fn_args.schema_path))

This unfortunately gives me an INVALID_ARGUMENT: No defined default loss for this combination of label type and task exception and fails the model training.

Definition of _input_fn is as follows:

def _input_fn(...):
        tf.data.TFRecordDataset(
            tf.data.Dataset.list_files(files), compression_type="GZIP"
        )
        .batch(1024)
        .map(
            lambda batch: tf.io.parse_example(batch, specs),
            num_parallel_calls=tf.data.AUTOTUNE,
        )
        .map(lambda batch: (batch, batch.pop(FeatureManager.LABEL_KEY)))
        .cache()
        .prefetch(tf.data.AUTOTUNE)

Which basically parses the schema into feature specs, parses the batch of TF-examples and finally maps them to a tuple of (Dict[feature_name, Tensor], Tensor), results is like this:

<PrefetchDataset 
 element_spec=(
   {'feature1': TensorSpec(shape=(None, 1), dtype=tf.float32, name=None), 'feature2': ...}, 
   TensorSpec(shape=(None, 1), dtype=tf.int64, name=None)
  )
>

Labels can be 0 or 1 and the task is a binary classification task.

Any idea what I might be doing wrong here?

Mac OS Monterey, tfdv 0.2.4, python 3.8, tfx 1.7

opened by AlirezaSadeghi 12

decision-forests 1.0.1

Hi, thanks for releasing version TF-DF v1.0.1. are there plans to re-add support for osx as well? I only see it for linux here: https://pypi.org/project/tensorflow-decision-forests/#files

opened by Arnold1 10

Can I load and use trained tfdf model in Java?

Hi I trained my tfdf model in python and want to use it in java for production. For conventional NN model, we can load the model from SavedModelBundle and get prediction.

try (SavedModelBundle b = SavedModelBundle.load("/tmp/model", "serve")) {

        // create the session from the Bundle
        Session sess = b.session();
        // create an input Tensor, value = 2.0f
        Tensor x = Tensor.create(
            new long[] {NUM_PREDICTIONS}, 
            FloatBuffer.wrap( new float[] {2.0f} ) 
        );
        
        // run the model
        float[] y = sess.runner()
            .feed("x", x)
            .fetch("y")
            .run()
            .get(0)
            .copyTo(new float[NUM_PREDICTIONS]);

        // print out the result.
        System.out.println(y[0]);
    }

I'm currently trying to use my tfdf model and wondering if current tfdf support loading and inference in Java? Will the model's graph and useful info be loaded? I'm still trying to load it and wondering if anyone has clue? Thank you so much!

question

opened by AudreyW0201 9

I run example ,but got error

Traceback (most recent call last): File "mydf.py", line 29, in model.fit(x=train_ds) File "/home/wanghaikuan/anaconda3/envs/python37/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py", line 1535, in fit class_weight=class_weight) File "/home/wanghaikuan/anaconda3/envs/python37/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py", line 1668, in _fit_implementation iterator) File "/home/wanghaikuan/anaconda3/envs/python37/lib/python3.7/site-packages/tensorflow/python/training/tracking/base.py", line 587, in _method_wrapper result = method(self, *args, **kwargs) File "/home/wanghaikuan/anaconda3/envs/python37/lib/python3.7/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler raise e.with_traceback(filtered_tb) from None File "/home/wanghaikuan/anaconda3/envs/python37/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py", line 1554, in _consumes_training_examples_until_eof num_examples += self.train_step(data) File "/home/wanghaikuan/anaconda3/envs/python37/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py", line 1027, in train_step return self.collect_data_step(data, is_training_example=True) File "/home/wanghaikuan/anaconda3/envs/python37/lib/python3.7/site-packages/tensorflow_decision_forests/keras/core.py", line 1236, in collect_data_step if not self._is_trained: tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: Using a symbolic tf.Tensor as a Python bool is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.

-------------------------------------------------------- my code is : import numpy as np import pandas as pd import tensorflow as tf import tensorflow_decision_forests as tfdf

print("Found TF-DF v" + tfdf.version)

dataset_path = tf.keras.utils.get_file( "adult.csv", "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/" "main/yggdrasil_decision_forests/test_data/dataset/adult.csv")

dataset_df = pd.read_csv(dataset_path) # "df" for Pandas's DataFrame.

print("First 3 examples:") print(dataset_df.head(3))

test_indices = np.random.rand(len(dataset_df)) < 0.30 test_ds_pd = dataset_df[test_indices] train_ds_pd = dataset_df[~test_indices] print(f"{len(train_ds_pd)} examples in training" f", {len(test_ds_pd)} examples for testing.")

train_ds = tfdf.keras.pd_dataframe_to_tf_dataset(train_ds_pd, label="income") test_ds = tfdf.keras.pd_dataframe_to_tf_dataset(test_ds_pd, label="income")

model = tfdf.keras.RandomForestModel(verbose=2) model.fit(x=train_ds)

i need you help, thanks

opened by whk6688 7
Checkpointing models during training

It seems the Keras ModelCheckpoint call back doesn't work with TFDF. Is there an alternate way to create checkpoints during training? I am training on a data set with tens of millions of samples and it takes several hours to train. I want to save the progress so that it doesn't need to retrain from scratch in case training crashes.
enhancement

opened by sibyjackgrove 7
Tensorflow decision forests after update to tf 2.6.0

There is a problem with Tensorflow_decision_forests after updating to version 2.6.0

here is the gist https://colab.research.google.com/gist/lukebor/70f7abd84d547bf39c4a8b47394e7017/beginner_colab.ipynb

I have used tensorflow beginner tutorial and upgraded the tf. If there is other way to import tfdf please let me know
bug

opened by lukebor 6
Shape error when using model.evaluate and model.fit(validation_data=validation_ds)
Dear authors,

I used tfdf.pd_dataframe_to_tf_dataset for train and test set respectively after making sure that both train and test had all 4 classes (single label for each data point).

I found that labels in two sets were integer encoded ([0 1 2 3]). I defined:

train = tfdf.keras.pd_dataframe_to_tf_dataset(df_train, label=label_column_name) test = tfdf.keras.pd_dataframe_to_tf_dataset(df_test, label=label_column_name) model = RandomForestModel(num_trees=5) model.fit(train, validation_data=test)

It raised error: ValueError: Shapes (None, 4) and (None, 1) are incompatible Then I move to this code:

model.fit(train) model.evaluate(test)

It raised error: ValueError: Shapes (None, 4) and (None, 1) are incompatible Then, I checked:

pred = model.predict(test) print(pred[0]) print(np.unique(pred))

Output:

[0. 1. 0. 0.] [0. 0.2 0.4 0.6 0.8 1. ]

Please help me to fix this error. Thank you so much.
opened by mainguyenanhvu 6
Shell classes
Initial folder structure for operator definitions. Defined:

base Operator and WindowOperator classes

example AssignOperator and SimpleMovingAverage shell classes

For simplicity at this point, and given we are going to be using pandas for the MVP, we decided to create these aliases:

Interval as an alias of pd.Timedelta

Sampling as an alias of pd.MultiIndex (with the restriction of the last level of it being a DatetimeIndex)

EventSequence as an alias of pd.DataFrame
opened by ianspektor 1
changes to protos
This PR updates the core.proto definition with latest changes discussed in today's sync.

Main changes:

List of Features in main Processor object.

Renamed some messages (EventSequence => Event, FeatureSequence => Feature, Timestamps => Sampling).

Unified Input and Output into a single EventArgument used for both inputs and outputs.

FeatureSequence and Feature unified into a single Feature, and can now be part of several events.

Final updated proto diagram:
opened by ianspektor 1
Contributing Tutorial

Hi, I'm a Kaggler, and I find tf-decision-forests very useful. Thus I want to contribute tutorials to the library ! Are tutorials contributions being accepted ? If yes then tutorials on which topics are currently in wishlist?

opened by shivance 1
Plot very large decision tree

Hi,

I have a decision tree with 20k nodes. How can I plot it?

I checked the d3.js code but with svg its pretty slow to render 20k nodes and use some zoom with it.

is there a way to generate a graphviz too and convert it to a huge png so I can view it with https://leafletjs.com/? or is there a way to draw the decision tree with d3 and canvas instead of svg?

opened by Arnold1 6
Predictions do not function as documented.

Prediction with TFDF is extremely under documented.

According to https://www.tensorflow.org/decision_forests/api_docs/python/tfdf/keras/RandomForestModel#predict you should be able to predict on numpy arrays, tensors, or datasets. Yet any attempt to do so has failed. It seems PrefetchDatasets are the only option.

On top of this, prediction is dreadfully slow. My current use case is to do ensemble predictions of images. The images are 144k pixels which requires ~20 seconds for one model to make a prediction. Pixelwise predicts with normal TF can be near instantaneous with predict_on_batch which TFDF models are supposed to support. But PrefetchDatasets aren't compatible with it. So the answer is to use Numpy arrays. But that again is incompatible. All of this is said to be supported in the documentation but they appear unimplementable.

I would like to stick with the TFDF method for my work but it is unreasonable slow.

How can I implement faster prediction when it seems it's an under-documented area?

opened by TheJeran 2

Releases(1.1.0)

1.1.0(Nov 18, 2022)
1.1.0 - 2022-11-18

Features

Native support for TensorFlow Decision Forests in TensorFlow Serving.

Add support for zipped Yggdrasil Decision Forests model for yggdrasil_model_to_keras_model.

Added model prediction tutorial.

Prevent premature stopping of GBT training through new parameter early_stopping_initial_iteration.

Fix

Using loaded datasets with TF-DF no longer fails (Github #131).

Automatically infer the semantic of int8 values as numerical (was categorical before).

Build script fixed

Model saving no longer fails when using invalid feature names.

Added keyword to pandas dataset drop (Github #135).

Source code(tar.gz)
Source code(zip)
1.1.0rc2(Nov 10, 2022)
Features

Support for Tensorflow Serving APIs.

Add support for zipped Yggdrasil Decision Forests model for yggdrasil_model_to_keras_model.

Added model prediction tutorial.

Prevent premature stopping of GBT training through new parameter early_stopping_initial_iteration.

Fix

Using loaded datasets with TF-DF no longer fails (Github #131).

Automatically infer the semantic of int8 values as numerical (was categorical before).

Build script fixed

Model saving no longer fails when using invalid feature names.

Added keyword to pandas dataset drop (Github #135).

Source code(tar.gz)
Source code(zip)
serving-1.0.1(Sep 20, 2022)
Nightly build of TensorFlow Serving 2.11. TensorFlow Serving >=2.11 supports natively TensorFlow Decision Forests models.

Build instructions:

git clone https://github.com/tensorflow/serving.git docker run -it -v ${PWD}/..:/working_dir -w /working_dir/serving tensorflow/serving:nightly-devel bash bazel build //tensorflow_serving/model_servers:tensorflow_model_server
Source code(tar.gz)
Source code(zip)
tensorflow_model_server_linux.zip(89.36 MB)
1.0.1(Sep 7, 2022)
TensorFlow Decision Forests 1.0.1

With this release, TensorFlow Decision Forests finally reaches its first major release 🥳

With this milestone we want to communicate more broadly that TensorFlow Decision Forests has become a more stable and mature library. In particular, we established more comprehensive testing to make sure that TF-DF is ready for professional environments.

Features

Add customization of the number of IO threads when using fit_on_dataset_path.

Fix

Improved documentation

Improved testing and stability

Issue in the application of auditwheel

Source code(tar.gz)
Source code(zip)
macos-1.0.1(Sep 16, 2022)

Experimental TF-DF Release for MacOS

This pre-release is designed to help testing a release for TF-DF 1.0.1 with different MacOS versions.

Make sure you pick a version corresponding to your MacOS version and Python version.
Source code(tar.gz)
Source code(zip)
tensorflow_decision_forests-1.0.1-cp310-cp310-macosx_12_0_arm64.whl(10.09 MB)
tensorflow_decision_forests-1.0.1-cp310-cp310-macosx_12_0_x86_64.whl(11.16 MB)
tensorflow_decision_forests-1.0.1-cp38-cp38-macosx_12_0_arm64.whl(10.09 MB)
tensorflow_decision_forests-1.0.1-cp38-cp38-macosx_12_0_x86_64.whl(11.16 MB)
tensorflow_decision_forests-1.0.1-cp39-cp39-macosx_12_0_arm64.whl(10.09 MB)
tensorflow_decision_forests-1.0.1-cp39-cp39-macosx_12_0_x86_64.whl(11.16 MB)
1.0.0rc0(Aug 26, 2022)
Fix

Improve documentation

Source code(tar.gz)
Source code(zip)
0.2.7(Jul 17, 2022)
Features

Multithreading of the oblique splitter for gradient boosted tree models.

Support for pure serving model i.e. model containing only serving data.

Add "edit_model" cli tool.

Fix

Remove bias toward low outcome in uplift modeling.

Source code(tar.gz)
Source code(zip)
serving-0.2.6(Jun 1, 2022)

TF-DF with TF-Serving binary for Tensorflow 2.9.1
Source code(tar.gz)
Source code(zip)
tensorflow_model_server_linux.zip(89.65 MB)
0.2.5(May 19, 2022)
Features

Adds the contrib module for contributed, non-core functionality.

Adds contrib.scikit_learn_model_converter, which facilitates converting Scikit-Learn tree-based models into TF-DF models.

Discard hessian splits with score lower than the parents. This change has little effect on the model quality, but it can reduce its size.

Add internal flag hessian_split_score_subtract_parent to subtract the parent score in the computation of an hessian split score.

Add support for hyper-parameter optimizers (also called tuner).

Add text pretty print of trees with tree.pretty() or str(tree).

Add support for loading YDF models with file prefixes. Newly created models have a random prefix attached to them. This allows combining multiple models in Keras.

Add support for discretized numerical features.

Source code(tar.gz)
Source code(zip)
0.2.3(Jan 27, 2022)
Features

Honest Random Forests (also work with Gradient Boosted Tree and CART).

Can train Random Forests with example sampling without replacement.

Add support for Focal Loss with Gradient Boosted Trees.

Add support for MacOS.

Fixes

Incorrect default evaluation of categorical split with uplift tasks. This was making uplift models with missing categorical values perform worst, and made the inference of uplift model possibly slower.

Fix pd_dataframe_to_tf_dataset on Pandas dataframe not containing arrays.

Source code(tar.gz)
Source code(zip)
tf_serving_linux.zip(83.06 MB)
0.2.2(Dec 15, 2021)
Features

Surface the validation_interval_in_trees, keep_non_leaf_label_distribution and 'random_seed' hyper-parameters.

Add the batch_size argument in the pd_dataframe_to_tf_dataset utility.

Automatically determine the number of threads if num_threads=None.

Add constructor argument try_resume_training to facilitate resuming training.

Check that the training dataset is well configured for TF-DF e.g. no repeat operation, has a large enough batch size, etc. The check can be disabled with check_dataset=False.

When a model is created manually with the model builder, and if the dataspec is not provided, tries to adapt the dataspec so that the model looks as if it was trained with the global imputation strategy for missing values (i.e. missing_value_policy: GLOBAL_IMPUTATION). This makes manually created models more likely to be compatible with the fast inference engines.

TF-DF models fit method now passes the validation_data to the Yggdrasil learners. This is used for example for early stopping in the case of GBT model.

Add the "loss" parameter of the GBT model directly in the model constructor.

Control the amount of training logs displayed in the notebook (if using notebook) or in the console with the verbose constructor argument and fit parameter of the model.

Fixes

num_candidate_attributes is not ignored anymore when num_candidate_attributes_ratio=-1.

Use the median bucket split value strategy in the discretized numerical splitters (local and distributed).

Surface the max_num_scanned_rows_to_accumulate_statistics parameter to control how many examples are scanned to determine the feature statistics when training from a file dataset with fit_on_dataset_path.

Source code(tar.gz)
Source code(zip)
0.2.1(Nov 8, 2021)
Features

Compatibility with TensorFlow 2.7.0.

Source code(tar.gz)
Source code(zip)
0.2.0(Nov 1, 2021)
Features

Add advanced option predict_single_probability_for_binary_classification to generate prediction tensors of shape [batch_size, 2] for binary classification model.

Add support for weighted training.

Add support for permutation variable importance in the GBT learner with the compute_permutation_variable_importance parameter.

Support for tf.int8 and tf.int16 values.

Support for distributed gradient boosted trees learning. Currently, the TF ParameterServerStrategy distribution strategy is only available in monolithic TF-DF builds. The Yggdrasil Decision Forest GRPC distribute strategy can be used instead.

Support for training from dataset stored on disk in CSV and RecordIO format (instead of creating a tensorflow dataset). This option is currently more efficient for distributed training (until the ParameterServerStrategy support per-worker datasets).

Add max_vocab_count argument to the model constructor. The existing max_vocab_count argument in FeatureUsage objects take precedence.

Fixes

Missing filtering of unique values in the categorical-set training feature accumulator. Was responsible for a small (e.g. ~0.5% on SST2 dataset) drop of accuracy compared to the C++ API.

Fix broken support for max_vocab_count in a FeatureUsage with type CATEGORICAL_SET.

Source code(tar.gz)
Source code(zip)
0.1.9(Aug 31, 2021)
Features

Disable tree pruning in the CART algorithm if the validation dataset is empty (i.e. validation_ratio=0).

Migration to Tensorflow 2.6. You will see an undefined symbol error if you install this version with a TensorFlow version different than 2.6. Previous versions were compiled for TF 2.5.

Fixes

Fix failure from Github Issue #45 where the wrong field was accessed for leaf node distributions.

Fix saving of categorical features specification in the Builder.

Source code(tar.gz)
Source code(zip)
0.1.9rc1(Aug 25, 2021)
Pre-release of 0.1.9

Major change : Tensorflow 2.6 compatibility

This release is currently being tested and will be updated to be the latest version in PyPI soon, in the meantime users who need the fixes below can install this version directly from the wheels below, i.e. pip install tensorflow_decision_forests-0.1.9-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl for python 3.9.

Fixes

Fix failure from Github Issue #45 where the wrong field was accessed for leaf node distributions.

Fix incorrect handling of CART pruning when validation set is empty. Previously, the whole tree would be erroneously pruned. Now, pruning is disabled if the validation set is not specified.

Fix saving of categorical features specification in the Builder.

Migration to Tensorflow 2.6. You will see an undefined symbol error if you install this version with a TensorFlow version different than 2.6. Previous versions were compiled for TF 2.5.

Source code(tar.gz)
Source code(zip)
tensorflow_decision_forests-0.1.9-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl(6.01 MB)
tensorflow_decision_forests-0.1.9-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.1.whl(6.01 MB)
tensorflow_decision_forests-0.1.9-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl(6.01 MB)
tensorflow_decision_forests-0.1.9-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl(6.01 MB)
0.1.8(Jul 29, 2021)
Features

Model can be composed with the functional Keras API before being trained.

Makes all the Yggdrasil structural variable importances available.

Makes getting the variable importance instantaneous.

Surface the name argument in the model classes constructors.

Add a postprocessing model constructor argument to easy apply post-processing on the model predictions without relying on the Keras Functional API.

Add extract_all_trees method in the model inspector to efficiently exact all the trees.

Add num_threads constructor argument to control the number of training threads without using the advanced configuration.

By default, remove the temporary directory used to train the model when the model python object is garbage collected.

Add the import_dataspec constructor argument to the model builder to import the feature definition and dictionaries (instead of relying on automatic discovery).

Changes

When saving a model in a directory already containing a model, only the assets directory is entirely removed before the export (instead of the entire model directory).

Fixes

Wrong label shape in the model inspector's objective field for pre-integerized labels.

Source code(tar.gz)
Source code(zip)
0.1.7(Jun 24, 2021)
Features

Add more of characters to the non-recommended list of feature name characters.

Make the inference op multi-thread compatible.

Print an explicit error and some instructions when training a model with a Pandas dataframe.

pd_dataframe_to_tf_dataset can automatically rename feature to make them compatible with SavedModel export signatures.

model.save(...) can override an existing model.

The link function of GBT model can be removed. For example, a binary classification GBT model trained with apply_link_function=False will output logits.

Source code(tar.gz)
Source code(zip)
0.1.6(Jun 8, 2021)
Features

Add hyper-parameter sorting_strategy to disable the computation of the pre-sorted index (slower to train, but consumes less memory).

Format wrapper code for colab help display.

Raises an error when a feature name is not compatible (e.g. contains a space).

Source code(tar.gz)
Source code(zip)
0.1.5(May 26, 2021)
Features

Raise an error of the number of classes is greater than 100 (can be disabled).

Raise an error if the model's task does not match the pd_dataframe_to_tf_dataset's task.

Bug fix

Fix failure when input feature contains commas.

Source code(tar.gz)
Source code(zip)
0.1.4(May 21, 2021)
Features

Stop the training when interrupting a colab cell / typing ctrl-c.

model.fit support training callbacks and a validation dataset.

Bug fix

Fix failure when there are not input features.

Source code(tar.gz)
Source code(zip)
0.1.2(May 18, 2021)
Features

Inference engines: QuickScorer Extended and Pred

Source code(tar.gz)
Source code(zip)
0.1.0(May 17, 2021)
Release 0.1.0 (2021-05-11)

Initial Release of TensorFlow Decision Forests.

Features

Random Forest learner.

Gradient Boosted Tree learner.

CART learner.

Model inspector: Inspect the internal model structure.

Model plotter: Plot decision trees.

Model builder: Create model "by hand".

Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository

Getting Profit and Loss Make Easy From Binance

Getting Profit and Loss Make Easy From Binance I have been in Binance Automated Trading for some time and have generated a lot of transaction records,

17 Dec 21, 2022

Real-time domain adaptation for semantic segmentation

Advanced-Machine-Learning This repository contains the code for the project Real

1 Jan 30, 2022

Napari sklearn decomposition

napari-sklearn-decomposition A simple plugin to use with napari This napari plug

1 Sep 01, 2022

database for artificial intelligence/machine learning data

AIDB v0.0.1 database for artificial intelligence/machine learning data Overview aidb is a database designed for large dataset for machine learning pro

1 Oct 24, 2021

My capstone project for Udacity's Machine Learning Nanodegree

MLND-Capstone My capstone project for Udacity's Machine Learning Nanodegree Lane Detection with Deep Learning In this project, I use a deep learning-b

407 Dec 12, 2022

Simple data balancing baselines for worst-group-accuracy benchmarks.

BalancingGroups Code to replicate the experimental results from Simple data balancing baselines achieve competitive worst-group-accuracy. Replicating

29 Dec 02, 2022

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning.

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported ha

1.1k Jan 04, 2023

Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms

1 Jan 31, 2022

QML: A Python Toolkit for Quantum Machine Learning

QML is a Python2/3-compatible toolkit for representation learning of properties of molecules and solids.

176 Dec 09, 2022

Implemented four supervised learning Machine Learning algorithms

Implemented four supervised learning Machine Learning algorithms from an algorithmic family called Classification and Regression Trees (CARTs), details see README_Report.

0 Jan 31, 2022

Solve automatic numerical differentiation problems in one or more variables.

numdifftools The numdifftools library is a suite of tools written in _Python to solve automatic numerical differentiation problems in one or more vari

181 Dec 16, 2022

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch

This is an implementation of the proximal policy optimization algorithm for the C++ API of Pytorch. It uses a simple TestEnvironment to test the algorithm

59 Dec 09, 2022

Simple, light-weight config handling through python data classes with to/from JSON serialization/deserialization.

Simple but maybe too simple config management through python data classes. We use it for machine learning.

67 Nov 29, 2022

Predicting India’s COVID-19 Third Wave with LSTM

Predicting India’s COVID-19 Third Wave with LSTM Complete project of predicting new COVID-19 cases in the next 90 days with LSTM India is seeing a ste

4 Jan 27, 2022

Combines MLflow with a database (PostgreSQL) and a reverse proxy (NGINX) into a multi-container Docker application

Combines MLflow with a database (PostgreSQL) and a reverse proxy (NGINX) into a multi-container Docker application (with docker-compose).

2 Dec 03, 2021

Educational python for Neural Networks, written in pure Python/NumPy.

127 Oct 27, 2022

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

DoWhy | An end-to-end library for causal inference Amit Sharma, Emre Kiciman Introducing DoWhy and the 4 steps of causal inference | Microsoft Researc

5.6k Jan 07, 2023

TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.

Related tags

Overview

TensorFlow Decision Forests

Usage example

Documentation & Resources

Installation

Contributing

Credits

License

Comments

model = tfdf.keras.RandomForestModel(verbose=2) model.fit(x=train_ds)

Releases(1.1.0)

1.1.0(Nov 18, 2022)

1.1.0 - 2022-11-18

Features

Fix

1.1.0rc2(Nov 10, 2022)

Features

Fix

serving-1.0.1(Sep 20, 2022)

1.0.1(Sep 7, 2022)

TensorFlow Decision Forests 1.0.1

Features

Fix

macos-1.0.1(Sep 16, 2022)

Experimental TF-DF Release for MacOS

1.0.0rc0(Aug 26, 2022)

Fix

0.2.7(Jul 17, 2022)

Features

Fix

serving-0.2.6(Jun 1, 2022)

0.2.5(May 19, 2022)

Features

0.2.3(Jan 27, 2022)

Features

Fixes

0.2.2(Dec 15, 2021)

Features

Fixes

0.2.1(Nov 8, 2021)

Features

0.2.0(Nov 1, 2021)

Features

Fixes

0.1.9(Aug 31, 2021)

Features

Fixes

0.1.9rc1(Aug 25, 2021)

Pre-release of 0.1.9

Fixes

0.1.8(Jul 29, 2021)

Features

Changes

Fixes

0.1.7(Jun 24, 2021)

Features

0.1.6(Jun 8, 2021)

Features

0.1.5(May 26, 2021)

Features

Bug fix

0.1.4(May 21, 2021)

Features

Bug fix

0.1.2(May 18, 2021)

Features

0.1.0(May 17, 2021)

Release 0.1.0 (2021-05-11)

Features

Owner

Getting Profit and Loss Make Easy From Binance

Real-time domain adaptation for semantic segmentation

Napari sklearn decomposition

database for artificial intelligence/machine learning data

My capstone project for Udacity's Machine Learning Nanodegree

Simple data balancing baselines for worst-group-accuracy benchmarks.

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning.

Breast-Cancer-Classification - Using SKLearn breast cancer dataset which contains 569 examples and 32 features classifying has been made with 6 different algorithms