[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark

Overview

build

TensorFrames (Deprecated)

Note: TensorFrames is deprecated. You can use pandas UDF instead.

Experimental TensorFlow binding for Scala and Apache Spark.

TensorFrames (TensorFlow on Spark DataFrames) lets you manipulate Apache Spark's DataFrames with TensorFlow programs.

This package is experimental and is provided as a technical preview only. While the interfaces are all implemented and working, there are still some areas of low performance.

Supported platforms:

This package only officially supports linux 64bit platforms as a target. Contributions are welcome for other platforms.

See the file project/Dependencies.scala for adding your own platform.

Officially TensorFrames supports Spark 2.4+ and Scala 2.11.

See the user guide for extensive information about the API.

For questions, see the TensorFrames mailing list.

TensorFrames is available as a Spark package.

Requirements

  • A working version of Apache Spark (2.4 or greater)

  • Java 8+

  • (Optional) python 2.7+/3.6+ if you want to use the python interface.

  • (Optional) the python TensorFlow package if you want to use the python interface. See the official instructions on how to get the latest release of TensorFlow.

  • (Optional) pandas >= 0.19.1 if you want to use the python interface

Additionally, for developement, you need the following dependencies:

  • protoc 3.x

  • nose >= 1.3

How to run in python

Assuming that SPARK_HOME is set, you can use PySpark like any other Spark package.

$SPARK_HOME/bin/pyspark --packages databricks:tensorframes:0.6.0-s_2.11

Here is a small program that uses TensorFlow to add 3 to an existing column.

import tensorflow as tf
import tensorframes as tfs
from pyspark.sql import Row

data = [Row(x=float(x)) for x in range(10)]
df = sqlContext.createDataFrame(data)
with tf.Graph().as_default() as g:
    # The TensorFlow placeholder that corresponds to column 'x'.
    # The shape of the placeholder is automatically inferred from the DataFrame.
    x = tfs.block(df, "x")
    # The output that adds 3 to x
    z = tf.add(x, 3, name='z')
    # The resulting dataframe
    df2 = tfs.map_blocks(z, df)

# The transform is lazy as for most DataFrame operations. This will trigger it:
df2.collect()

# Notice that z is an extra column next to x

# [Row(z=3.0, x=0.0),
#  Row(z=4.0, x=1.0),
#  Row(z=5.0, x=2.0),
#  Row(z=6.0, x=3.0),
#  Row(z=7.0, x=4.0),
#  Row(z=8.0, x=5.0),
#  Row(z=9.0, x=6.0),
#  Row(z=10.0, x=7.0),
#  Row(z=11.0, x=8.0),
#  Row(z=12.0, x=9.0)]

The second example shows the block-wise reducing operations: we compute the sum of a field containing vectors of integers, working with blocks of rows for more efficient processing.

# Build a DataFrame of vectors
data = [Row(y=[float(y), float(-y)]) for y in range(10)]
df = sqlContext.createDataFrame(data)
# Because the dataframe contains vectors, we need to analyze it first to find the
# dimensions of the vectors.
df2 = tfs.analyze(df)

# The information gathered by TF can be printed to check the content:
tfs.print_schema(df2)
# root
#  |-- y: array (nullable = false) double[?,2]

# Let's use the analyzed dataframe to compute the sum and the elementwise minimum 
# of all the vectors:
# First, let's make a copy of the 'y' column. This will be very cheap in Spark 2.0+
df3 = df2.select(df2.y, df2.y.alias("z"))
with tf.Graph().as_default() as g:
    # The placeholders. Note the special name that end with '_input':
    y_input = tfs.block(df3, 'y', tf_name="y_input")
    z_input = tfs.block(df3, 'z', tf_name="z_input")
    y = tf.reduce_sum(y_input, [0], name='y')
    z = tf.reduce_min(z_input, [0], name='z')
    # The resulting dataframe
    (data_sum, data_min) = tfs.reduce_blocks([y, z], df3)

# The final results are numpy arrays:
print(data_sum)
# [45., -45.]
print(data_min)
# [0., -9.]

Notes

Note the scoping of the graphs above. This is important because TensorFrames finds which DataFrame column to feed to TensorFrames based on the placeholders of the graph. Also, it is good practice to keep small graphs when sending them to Spark.

For small tensors (scalars and vectors), TensorFrames usually infers the shapes of the tensors without requiring a preliminary analysis. If it cannot do it, an error message will indicate that you need to run the DataFrame through tfs.analyze() first.

Look at the python documentation of the TensorFrames package to see what methods are available.

How to run in Scala

The scala support is a bit more limited than python. In scala, operations can be loaded from an existing graph defined in the ProtocolBuffers format, or using a simple scala DSL. The Scala DSL only features a subset of TensorFlow transforms. It is very easy to extend though, so other transforms will be added without much effort in the future.

You simply use the published package:

$SPARK_HOME/bin/spark-shell --packages databricks:tensorframes:0.6.0-s_2.11

Here is the same program as before:

import org.tensorframes.{dsl => tf}
import org.tensorframes.dsl.Implicits._

val df = spark.createDataFrame(Seq(1.0->1.1, 2.0->2.2)).toDF("a", "b")

// As in Python, scoping is recommended to prevent name collisions.
val df2 = tf.withGraph {
    val a = df.block("a")
    // Unlike python, the scala syntax is more flexible:
    val out = a + 3.0 named "out"
    // The 'mapBlocks' method is added using implicits to dataframes.
    df.mapBlocks(out).select("a", "out")
}

// The transform is all lazy at this point, let's execute it with collect:
df2.collect()
// res0: Array[org.apache.spark.sql.Row] = Array([1.0,4.0], [2.0,5.0])   

How to compile and install for developers

It is recommended you use Conda Environment to guarantee that the build environment can be reproduced. Once you have installed Conda, you can set the environment from the root of project:

conda create -q -n tensorframes-environment python=$PYTHON_VERSION

This will create an environment for your project. We recommend using Python version 3.7 or 2.7.13. After the environemnt is created, you can activate it and install all dependencies as follows:

conda activate tensorframes-environment
pip install --user -r python/requirements.txt

You also need to compile the scala code. The recommended procedure is to use the assembly:

build/sbt tfs_testing/assembly
# Builds the spark package:
build/sbt distribution/spDist

Assuming that SPARK_HOME is set and that you are in the root directory of the project:

$SPARK_HOME/bin/spark-shell --jars $PWD/target/testing/scala-2.11/tensorframes-assembly-0.6.1-SNAPSHOT.jar

If you want to run the python version:

PYTHONPATH=$PWD/target/testing/scala-2.11/tensorframes-assembly-0.6.1-SNAPSHOT.jar \
$SPARK_HOME/bin/pyspark --jars $PWD/target/testing/scala-2.11/tensorframes-assembly-0.6.1-SNAPSHOT.jar

Acknowledgements

Before TensorFlow released its Java API, this project was built on the great javacpp project, that implements the low-level bindings between TensorFlow and the Java virtual machine.

Many thanks to Google for the release of TensorFlow.

Comments
  •  java.lang.ClassNotFoundException: org.tensorframes.impl.DebugRowOps

    java.lang.ClassNotFoundException: org.tensorframes.impl.DebugRowOps

    I build the jar by follow the readme, and then run it in pycharm https://www.dropbox.com/s/qmrs72l0p8p4bc2/Screen%20Shot%202016-07-06%20at%2011.40.26%20PM.png?dl=0 I add the self build jar as content root, I guess that's cause the error,

    line 11 is x = tfs.block(df, "x")

    code:

    import tensorflow as tf
    import tensorframes as tfs
    from pyspark.shell import sqlContext
    from pyspark.sql import Row
    
    data = [Row(x=float(x)) for x in range(10)]
    df = sqlContext.createDataFrame(data)
    
    with tf.Graph().as_default() as g:
        # The TensorFlow placeholder that corresponds to column 'x'.
        # The shape of the placeholder is automatically inferred from the DataFrame.
        x = tfs.block(df, "x")
        # The output that adds 3 to x
        z = tf.add(x, 3, name='z')
        # The resulting dataframe
        df2 = tfs.map_blocks(z, df)
    
    # The transform is lazy as for most DataFrame operations. This will trigger it:
    df2.collect()
    

    log

    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    16/07/06 23:28:43 INFO SparkContext: Running Spark version 1.6.1
    16/07/06 23:28:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    16/07/06 23:28:44 INFO SecurityManager: Changing view acls to: julian_qian
    16/07/06 23:28:44 INFO SecurityManager: Changing modify acls to: julian_qian
    16/07/06 23:28:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(julian_qian); users with modify permissions: Set(julian_qian)
    16/07/06 23:28:44 INFO Utils: Successfully started service 'sparkDriver' on port 60597.
    16/07/06 23:28:45 INFO Slf4jLogger: Slf4jLogger started
    16/07/06 23:28:45 INFO Remoting: Starting remoting
    16/07/06 23:28:45 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:60598]
    16/07/06 23:28:45 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 60598.
    16/07/06 23:28:45 INFO SparkEnv: Registering MapOutputTracker
    16/07/06 23:28:45 INFO SparkEnv: Registering BlockManagerMaster
    16/07/06 23:28:45 INFO DiskBlockManager: Created local directory at /private/var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/blockmgr-5174cef3-29d9-4d2a-a84e-279a0e3d2f83
    16/07/06 23:28:45 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
    16/07/06 23:28:45 INFO SparkEnv: Registering OutputCommitCoordinator
    16/07/06 23:28:45 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
    16/07/06 23:28:45 INFO Utils: Successfully started service 'SparkUI' on port 4041.
    16/07/06 23:28:45 INFO SparkUI: Started SparkUI at http://10.63.21.172:4041
    16/07/06 23:28:45 INFO Executor: Starting executor ID driver on host localhost
    16/07/06 23:28:45 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 60599.
    16/07/06 23:28:45 INFO NettyBlockTransferService: Server created on 60599
    16/07/06 23:28:45 INFO BlockManagerMaster: Trying to register BlockManager
    16/07/06 23:28:45 INFO BlockManagerMasterEndpoint: Registering block manager localhost:60599 with 511.1 MB RAM, BlockManagerId(driver, localhost, 60599)
    16/07/06 23:28:45 INFO BlockManagerMaster: Registered BlockManager
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_\   version 1.6.1
          /_/
    
    Using Python version 2.7.10 (default, Dec  1 2015 20:00:13)
    SparkContext available as sc, HiveContext available as sqlContext.
    16/07/06 23:28:46 INFO HiveContext: Initializing execution hive, version 1.2.1
    16/07/06 23:28:46 INFO ClientWrapper: Inspected Hadoop version: 2.6.0
    16/07/06 23:28:46 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
    16/07/06 23:28:46 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    16/07/06 23:28:46 INFO ObjectStore: ObjectStore, initialize called
    16/07/06 23:28:46 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
    16/07/06 23:28:46 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
    16/07/06 23:28:46 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/07/06 23:28:47 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/07/06 23:28:48 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
    16/07/06 23:28:48 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:48 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:49 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
    16/07/06 23:28:49 INFO ObjectStore: Initialized ObjectStore
    16/07/06 23:28:49 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
    16/07/06 23:28:49 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
    16/07/06 23:28:49 INFO HiveMetaStore: Added admin role in metastore
    16/07/06 23:28:49 INFO HiveMetaStore: Added public role in metastore
    16/07/06 23:28:49 INFO HiveMetaStore: No user is added in admin role, since config is empty
    16/07/06 23:28:49 INFO HiveMetaStore: 0: get_all_databases
    16/07/06 23:28:49 INFO audit: ugi=julian_qian   ip=unknown-ip-addr  cmd=get_all_databases   
    16/07/06 23:28:49 INFO HiveMetaStore: 0: get_functions: db=default pat=*
    16/07/06 23:28:49 INFO audit: ugi=julian_qian   ip=unknown-ip-addr  cmd=get_functions: db=default pat=* 
    16/07/06 23:28:49 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:49 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian
    16/07/06 23:28:49 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/julian_qian
    16/07/06 23:28:49 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/f9d3c8e6-6b5d-4a0c-b2cf-a50aa101fb62_resources
    16/07/06 23:28:49 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian/f9d3c8e6-6b5d-4a0c-b2cf-a50aa101fb62
    16/07/06 23:28:49 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/julian_qian/f9d3c8e6-6b5d-4a0c-b2cf-a50aa101fb62
    16/07/06 23:28:49 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian/f9d3c8e6-6b5d-4a0c-b2cf-a50aa101fb62/_tmp_space.db
    16/07/06 23:28:49 INFO HiveContext: default warehouse location is /user/hive/warehouse
    16/07/06 23:28:49 INFO HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
    16/07/06 23:28:49 INFO ClientWrapper: Inspected Hadoop version: 2.6.0
    16/07/06 23:28:49 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
    16/07/06 23:28:50 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    16/07/06 23:28:50 INFO ObjectStore: ObjectStore, initialize called
    16/07/06 23:28:50 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
    16/07/06 23:28:50 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
    16/07/06 23:28:50 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/07/06 23:28:50 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    16/07/06 23:28:51 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
    16/07/06 23:28:51 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:51 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:52 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:52 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:52 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
    16/07/06 23:28:52 INFO ObjectStore: Initialized ObjectStore
    16/07/06 23:28:52 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
    16/07/06 23:28:52 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
    16/07/06 23:28:52 INFO HiveMetaStore: Added admin role in metastore
    16/07/06 23:28:52 INFO HiveMetaStore: Added public role in metastore
    16/07/06 23:28:52 INFO HiveMetaStore: No user is added in admin role, since config is empty
    16/07/06 23:28:52 INFO HiveMetaStore: 0: get_all_databases
    16/07/06 23:28:52 INFO audit: ugi=julian_qian   ip=unknown-ip-addr  cmd=get_all_databases   
    16/07/06 23:28:53 INFO HiveMetaStore: 0: get_functions: db=default pat=*
    16/07/06 23:28:53 INFO audit: ugi=julian_qian   ip=unknown-ip-addr  cmd=get_functions: db=default pat=* 
    16/07/06 23:28:53 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
    16/07/06 23:28:53 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/77eb618d-61cc-470e-abb4-18d356833efb_resources
    16/07/06 23:28:53 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian/77eb618d-61cc-470e-abb4-18d356833efb
    16/07/06 23:28:53 INFO SessionState: Created local directory: /var/folders/9c/h8czn5n53yd69xz45wjhk0fw0000gn/T/julian_qian/77eb618d-61cc-470e-abb4-18d356833efb
    16/07/06 23:28:53 INFO SessionState: Created HDFS directory: /tmp/hive/julian_qian/77eb618d-61cc-470e-abb4-18d356833efb/_tmp_space.db
    

    error log:

    Traceback (most recent call last):
      File "/Users/julian_qian/PycharmProjects/tensorflow/tfs.py", line 11, in <module>
        x = tfs.block(df, "x")
      File "/Users/julian_qian/etc/work/python/tensorframes/tensorframes-assembly-0.2.3.jar/tensorframes/core.py", line 315, in block
      File "/Users/julian_qian/etc/work/python/tensorframes/tensorframes-assembly-0.2.3.jar/tensorframes/core.py", line 333, in _auto_placeholder
      File "/Users/julian_qian/etc/work/python/tensorframes/tensorframes-assembly-0.2.3.jar/tensorframes/core.py", line 30, in _java_api
      File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
      File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/pyspark.zip/pyspark/sql/utils.py", line 45, in deco
      File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/py4j-0.9-src.zip/py4j/protocol.py", line 308, in get_return_value
    py4j.protocol.Py4JJavaError: An error occurred while calling o32.loadClass.
    : java.lang.ClassNotFoundException: org.tensorframes.impl.DebugRowOps
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)
    
    opened by jq 13
  • Updated to tensorflow 1.6 and spark 2.3.

    Updated to tensorflow 1.6 and spark 2.3.

    Current version is not compatible with graphs generated by tf1.6 and it's preventing us from releasing dl-pipelines with tf1.6 support.

    • updated protobuf files and regenerated their java sources.
    • few minor changes related to Tensor taking a type parameter in tf1.6.
    opened by tomasatdatabricks 8
  • tensorframes is not working with variables.

    tensorframes is not working with variables.

    data = [Row(x=float(x)) for x in range(5)]
    df = sqlContext.createDataFrame(data)
    with tf.Graph().as_default() as g:
        # The placeholder that corresponds to column 'x'
        x = tf.placeholder(tf.double, shape=[None], name="x")
        # The output that adds 3 to x
        b = tf.Variable(float(3), name='a', dtype=tf.double)
        z = tf.add(x, b, name='z')
        #with or without `sess.run(tf.global_variables_initializer())`  following will fail
        
        df2 = tfs.map_blocks(z, df)
    
    df2.show()
    
    opened by yupbank 7
  • Does not work with Python3

    Does not work with Python3

    I just started using this with Python3, these are my commands run and the output messages.

    $SPARK_HOME/bin/pyspark --packages databricks:tensorframes:0.2.3-s_2.10

    Python 3.4.3 (default, Mar 26 2015, 22:03:40) [GCC 4.9.2] on linux Type "help", "copyright", "credits" or "license" for more information. Ivy Default Cache set to: /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars :: loading settings :: url = jar:file:/opt/spark-1.5.2/assembly/target/scala-2.10/spark-assembly-1.5.2-hadoop2.2.0.jar!/org/apache/ivy/core/settings/ivysettings.xml databricks#tensorframes added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] found databricks#tensorframes;0.2.3-s_2.10 in spark-packages found org.apache.commons#commons-lang3;3.4 in central :: resolution report :: resolve 98ms :: artifacts dl 4ms :: modules in use: databricks#tensorframes;0.2.3-s_2.10 from spark-packages in [default] org.apache.commons#commons-lang3;3.4 from central in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 2 | 0 | 0 | 0 || 2 | 0 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 artifacts copied, 2 already retrieved (0kB/3ms) Welcome to ____ __ / / ___ / / \ / _ / _ `/ __/ '/ / / .__/,// //_\ version 1.5.2 //

    Using Python version 3.4.3 (default, Mar 26 2015 22:03:40) SparkContext available as sc, SQLContext available as sqlContext.

    import tensorflow as tf import tensorframes as tfs

    Traceback (most recent call last): File "", line 1, in File "/tmp/spark-349c9955-ccd8-4fcd-938a-7e719fc45653/userFiles-bb935142-224f-4238-a144-f1cece7a5aa2/databricks_tensorframes-0.2.3-s_2.10.jar/tensorframes/init.py", line 36, in ImportError: No module named 'core'

    opened by ushnish 6
  • Scala example does not work

    Scala example does not work

    I'm having trouble running the provided Scala example in the spark shell.

    My local environment is:

    • Spark 2.1.0
    • Scala version 2.11.8
    • Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121

    I ran the spark-shell with: spark-shell --packages databricks:tensorframes:0.2.5-rc2-s_2.11

    I get the following stacktrace which shuts down my spark process:

    #
    # A fatal error has been detected by the Java Runtime Environment:
    #
    #  SIGSEGV (0xb) at pc=0x00007fff90451b52, pid=64869, tid=0x0000000000001c03
    #
    # JRE version: Java(TM) SE Runtime Environment (8.0_121-b13) (build 1.8.0_121-b13)
    # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.121-b13 mixed mode bsd-amd64 compressed oops)
    # Problematic frame:
    # C  [libsystem_c.dylib+0x1b52]  strlen+0x12
    #
    # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
    #
    # An error report file with more information is saved as:
    # /Users/ndrizard/projects/temps/hs_err_pid64869.log
    #
    # If you would like to submit a bug report, please visit:
    #   http://bugreport.java.com/bugreport/crash.jsp
    # The crash happened outside the Java Virtual Machine in native code.
    # See problematic frame for where to report the bug.
    #
    

    Thanks for your help!

    opened by nicodri 5
  • Py4JError(

    Py4JError("Answer from Java side is empty") while testing

    I have been experimenting with TensorFrames from quite some days. I have spark-1.6.1 and openjdk7 installed on my ubuntu 14.04 64bit machine. I am using IPython notebook for testing.

    import tensorframes as tfs command is working perfectly fine, but when i do tfs.print_schema(df), where df is a dataframe. The below error pops recursively till max. depth is reached.

    ERROR:py4j.java_gateway:Error while sending or receiving. Traceback (most recent call last): File "/home/prakhar/utilities/spark-1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 746, in send_command raise Py4JError("Answer from Java side is empty") Py4JError: Answer from Java side is empty

    opened by prakhar21 4
  • [ML-7986] Update tensorflow to 1.14.0

    [ML-7986] Update tensorflow to 1.14.0

    • Update tensorflow version to 1.14.0 in environment.yml, project/Dependencies.scala, and python/requirements.txt
    • Auto update *.proto with the script. All of this type update comes from tensorflow.
    opened by lu-wang-dl 3
  • Support Spark 2.3.1, TF 1.10.0 and drop Spark 2.1/2.2 (and hence Scala 2.10, Java 7)

    Support Spark 2.3.1, TF 1.10.0 and drop Spark 2.1/2.2 (and hence Scala 2.10, Java 7)

    • Drop support for Spark 2.1 and 2.2 and hence scala 2.10 and java 7
    • Update TF to 1.10 release
    • Remove nix files, which are not used
    • Update README

    We will support Spark 2.4 once RC is released.

    opened by mengxr 3
  • Usage of tf.contrib.distributions.percentile fails

    Usage of tf.contrib.distributions.percentile fails

    Consider the following dummy example using tf.contrib.distributions.percentile:

    from pyspark.context import SparkContext
    from pyspark.conf import SparkConf
    import tensorflow as tf
    import tensorframes as tfs
    from pyspark import SQLContext
    from pyspark.sql import Row
    from pyspark.sql.functions import *
    
    conf = SparkConf().setAppName("repro")
    sc = SparkContext(conf=conf)
    sqlContext = SQLContext(sc)
    
    data = [Row(x=[1.111, 0.516, 12.759]), Row(x=[2.222, 1.516, 13.759]), Row(x=[3.333, 2.516, 14.759]), Row(x=[4.444, 3.516, 15.759])]
    df = tfs.analyze(sqlContext.createDataFrame(data))
    
    with tf.Graph().as_default() as g:
    	x = tfs.block(df, "x")
    	q = tf.constant(90, 'float64', name='Percentile')
    	qntl = tf.contrib.distributions.percentile(x, q, axis=1)
    	result = tfs.map_blocks(x, df)
    	
    

    This fails with

    Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2752, in _as_graph_element_locked
        return op.outputs[out_n]
    IndexError: list index out of range
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "<stdin>", line 5, in <module>
      File "/tmp/spark-7f93e8f4-b6dc-4356-98fc-4c6de58c1e71/userFiles-f204b475-06b5-44e9-9a11-dbee001767d4/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 312, in map_blocks
      File "/tmp/spark-7f93e8f4-b6dc-4356-98fc-4c6de58c1e71/userFiles-f204b475-06b5-44e9-9a11-dbee001767d4/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 152, in _map
      File "/tmp/spark-7f93e8f4-b6dc-4356-98fc-4c6de58c1e71/userFiles-f204b475-06b5-44e9-9a11-dbee001767d4/databricks_tensorframes-0.2.9-s_2.11.jar/tensorframes/core.py", line 83, in _add_shapes
      File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2880, in get_tensor_by_name
        return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
      File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2708, in as_graph_element
        return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
      File "/usr/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2757, in _as_graph_element_locked
        % (repr(name), repr(op_name), len(op.outputs)))
    KeyError: "The name 'percentile/assert_integer/statically_determined_was_integer:0' refers to a Tensor which does not exist. The operation, 'percentile/assert_integer/statically_determined_was_integer', exists but only has 0 outputs."
    
    opened by martinstuder 3
  • Readme Example throwing Py4J error

    Readme Example throwing Py4J error

    I am using Spark 2.0.2, Python 2.7.12, iPython 5.1.0 on macOS 10.12.1.

    I am launching pyspark like this

    $SPARK_HOME/bin/pyspark --packages databricks:tensorframes:0.2.3-s_2.10

    From the demo, this block

    with tf.Graph().as_default() as g:
        x = tfs.block(df, "x")
        z = tf.add(x, 3, name='z')
        df2 = tfs.map_blocks(z, df)
    

    crashes with the following traceback:

    ---------------------------------------------------------------------------
    Py4JJavaError                             Traceback (most recent call last)
    <ipython-input-3-e7ae284146c3> in <module>()
          4     # The TensorFlow placeholder that corresponds to column 'x'.
          5     # The shape of the placeholder is automatically inferred from the DataFrame.
    ----> 6     x = tfs.block(df, "x")
          7     # The output that adds 3 to x
          8     z = tf.add(x, 3, name='z')
    
    /private/var/folders/tb/r74wwyk17b3fn_frdb0gd0780000gn/T/spark-b3c869d7-6d28-4bce-9a2e-d46f43fc83df/userFiles-64edc1b1-03db-40e6-9d7f-f062d0491a77/databricks_tensorframes-0.2.3-s_2.10.jar/tensorframes/core.py in block(df, col_name, tf_name)
        313     :return: a TensorFlow placeholder.
        314     """
    --> 315     return _auto_placeholder(df, col_name, tf_name, block = True)
        316
        317 def row(df, col_name, tf_name = None):
    
    /private/var/folders/tb/r74wwyk17b3fn_frdb0gd0780000gn/T/spark-b3c869d7-6d28-4bce-9a2e-d46f43fc83df/userFiles-64edc1b1-03db-40e6-9d7f-f062d0491a77/databricks_tensorframes-0.2.3-s_2.10.jar/tensorframes/core.py in _auto_placeholder(df, col_name, tf_name, block)
        331
        332 def _auto_placeholder(df, col_name, tf_name, block):
    --> 333     info = _java_api().extra_schema_info(df._jdf)
        334     col_shape = [x.shape() for x in info if x.fieldName() == col_name]
        335     if len(col_shape) == 0:
    
    /private/var/folders/tb/r74wwyk17b3fn_frdb0gd0780000gn/T/spark-b3c869d7-6d28-4bce-9a2e-d46f43fc83df/userFiles-64edc1b1-03db-40e6-9d7f-f062d0491a77/databricks_tensorframes-0.2.3-s_2.10.jar/tensorframes/core.py in _java_api()
         28     # You cannot simply call the creation of the the class on the _jvm due to classloader issues
         29     # with Py4J.
    ---> 30     return _jvm.Thread.currentThread().getContextClassLoader().loadClass(javaClassName) \
         31         .newInstance()
         32
    
    /Users/damien/spark-2.0.2-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py in __call__(self, *args)
       1131         answer = self.gateway_client.send_command(command)
       1132         return_value = get_return_value(
    -> 1133             answer, self.gateway_client, self.target_id, self.name)
       1134
       1135         for temp_arg in temp_args:
    
    /Users/damien/spark-2.0.2-bin-hadoop2.7/python/pyspark/sql/utils.py in deco(*a, **kw)
         61     def deco(*a, **kw):
         62         try:
    ---> 63             return f(*a, **kw)
         64         except py4j.protocol.Py4JJavaError as e:
         65             s = e.java_exception.toString()
    
    /Users/damien/spark-2.0.2-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
        317                 raise Py4JJavaError(
        318                     "An error occurred while calling {0}{1}{2}.\n".
    --> 319                     format(target_id, ".", name), value)
        320             else:
        321                 raise Py4JError(
    
    Py4JJavaError: An error occurred while calling o47.loadClass.
    : java.lang.NoClassDefFoundError: org/apache/spark/Logging
    	at java.lang.ClassLoader.defineClass1(Native Method)
    	at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
    	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    	at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
    	at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    	at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
    	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    	at py4j.Gateway.invoke(Gateway.java:280)
    	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    	at py4j.commands.CallCommand.execute(CallCommand.java:79)
    	at py4j.GatewayConnection.run(GatewayConnection.java:214)
    	at java.lang.Thread.run(Thread.java:745)
    Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    	... 22 more
    
    opened by damienstanton 3
  • Spark 2.0.0 + ScalaTest 3.0.0 + updates sbt plugins

    Spark 2.0.0 + ScalaTest 3.0.0 + updates sbt plugins

    The subject says it all.

    WARNING: doit works (since it disabled tests in assembly), but I could not get sbt test working. It fails with the following error which is more about TensorFlow that I know nothing about:

    ➜  tensorframes git:(spark-200-and-other-upgrades) sbt
    [info] Loading global plugins from /Users/jacek/.sbt/0.13/plugins
    [info] Loading project definition from /Users/jacek/dev/oss/tensorframes/project
    [info] Set current project to tensorframes (in build file:/Users/jacek/dev/oss/tensorframes/)
    > testOnly org.tensorframes.dsl.BasicOpsSuite
    16/08/04 23:52:22 DEBUG Paths$: Request for x -> 0
    16/08/04 23:52:22 DEBUG Paths$: Request for y -> 0
    16/08/04 23:52:22 DEBUG Paths$: Request for z -> 0
    
    import tensorflow as tf
    
    x = tf.constant(1, name='x')
    y = tf.constant(2, name='y')
    z = tf.add(x, y, name='z')
    
    g = tf.get_default_graph().as_graph_def()
    for n in g.node:
        print ">>>>>", str(n.name), "<<<<<<"
        print n
    
    [info] BasicOpsSuite:
    [info] - Add *** FAILED ***
    [info]   1 did not equal 0 (1,===========
    [info]   
    [info]   import tensorflow as tf
    [info]   
    [info]   x = tf.constant(1, name='x')
    [info]   y = tf.constant(2, name='y')
    [info]   z = tf.add(x, y, name='z')
    [info]         
    [info]   g = tf.get_default_graph().as_graph_def()
    [info]   for n in g.node:
    [info]       print ">>>>>", str(n.name), "<<<<<<"
    [info]       print n
    [info]          
    [info]   ===========) (ExtractNodes.scala:40)
    [info] Run completed in 1 second, 772 milliseconds.
    [info] Total number of tests run: 1
    [info] Suites: completed 1, aborted 0
    [info] Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0
    [info] *** 1 TEST FAILED ***
    [error] Failed tests:
    [error]         org.tensorframes.dsl.BasicOpsSuite
    [error] (test:testOnly) sbt.TestsFailedException: Tests unsuccessful
    [error] Total time: 2 s, completed Aug 4, 2016 11:52:22 PM
    

    I'm proposing the PR hoping the issue is a minor one that could easily be fixed with enough guidance.

    opened by jaceklaskowski 3
  • Bump tensorflow from 1.15.0 to 2.9.3 in /python

    Bump tensorflow from 1.15.0 to 2.9.3 in /python

    Bumps tensorflow from 1.15.0 to 2.9.3.

    Release notes

    Sourced from tensorflow's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Support with deep learning model plugging

    Support with deep learning model plugging

    Can you guys help to plug this https://github.com/hongzimao/decima-sim deep learning model into tensorforms? Is it possible to do, any help will be highly appreciated.

    opened by jahidhasanlinix 0
  • Need help with enabling GPUs while predicting through fine-tuned BERT Tensorflow Model on Azure Databricks

    Need help with enabling GPUs while predicting through fine-tuned BERT Tensorflow Model on Azure Databricks

    Hi, I am referring to this code (https://github.com/google-research/bert/blob/master/predicting_movie_reviews_with_bert_on_tf_hub.ipynb for classification) and running it on Azure Databricks Runtime 7.2 ML (includes Apache Spark 3.0.0, GPU, Scala 2.12). I was able to train a model. Although for predictions, I am using a 4 GPU cluster but it is still taking very long time. I suspect that my cluster is not fully utilized and infact still being used as CPU only...Is there anything I need to change to ensure that the GPUs cluster is being utilized and able to function in distributed manner.

    I also referred to Databricks documentation (https://docs.microsoft.com/en-us/azure/databricks/applications/machine-learning/train-model/tensorflow) and did install gpu enabled tensorflow mentioned as:

    %pip install https://databricks-prod-cloudfront.cloud.databricks.com/artifacts/tensorflow/runtime-7.x/tensorflow-1.15.3-cp37-cp37m-linux_x86_64.whl

    But even after that print([tf.version, tf.test.is_gpu_available()]) still shows FALSE as value and no improvement in my cluster utilization Can anyone help on how can i enable full cluster utilization (to worker nodes) for my prediction through fine-tuned bert model?

    I would really appreciate the help.

    opened by samvygupta 0
  • Having java.lang.NoSuchMethodError: org.tensorflow.framework.GraphDef.toByteString()

    Having java.lang.NoSuchMethodError: org.tensorflow.framework.GraphDef.toByteString()

    Hi I want to use DeepImageFeaturizer combined with spark ML Logistic regression in Spark (2.4.5) / scala 2.11.12 but it's not working. I'm trying to resolve it for many days.

    I have this issue : java.lang.NoSuchMethodError: org.tensorflow.framework.GraphDef.toByteString()Lorg/tensorframes/protobuf3shade/ByteString;

    It seems a library is missing but i think I've already referenced all the needed ones :

    delta-core_2.11-0.6.0.jar
    libtensorflow-1.15.0.jar
    libtensorflow_jni-1.15.0.jar
    libtensorflow_jni_gpu-1.15.0.jar
    proto-1.15.0.jar
    scala-logging-api_2.11-2.1.2.jar
    scala-logging-slf4j_2.11-2.1.2.jar
    scala-logging_2.11-3.9.2.jar
    spark-deep-learning-1.5.0-spark2.4-s_2.11.jar
    spark-sql-kafka-0-10_2.11-2.4.5.jar
    spark-tensorflow-connector_2.11-1.6.0.jar
    tensorflow-1.15.0.jar
    tensorflow-hadoop-1.15.0.jar
    tensorframes-0.8.2-s_2.11.jar
    

    Full trace :

    20/05/15 21:17:28 DEBUG impl.TensorFlowOps$: Outputs: Set(InceptionV3_sparkdl_output__)
    Exception in thread "main" java.lang.reflect.InvocationTargetException
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:65)
    	at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
    Caused by: java.lang.NoSuchMethodError: org.tensorflow.framework.GraphDef.toByteString()Lorg/tensorframes/protobuf3shade/ByteString;
    	at org.tensorframes.impl.TensorFlowOps$.graphSerial(TensorFlowOps.scala:69)
    	at org.tensorframes.impl.TensorFlowOps$.analyzeGraphTF(TensorFlowOps.scala:114)
    	at org.tensorframes.impl.DebugRowOps.mapRows(DebugRowOps.scala:408)
    	at com.databricks.sparkdl.DeepImageFeaturizer.transform(DeepImageFeaturizer.scala:135)
    	at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:161)
    	at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:149)
    	at scala.collection.Iterator$class.foreach(Iterator.scala:891)
    	at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
    	at scala.collection.IterableViewLike$Transformed$class.foreach(IterableViewLike.scala:44)
    	at scala.collection.SeqViewLike$AbstractTransformed.foreach(SeqViewLike.scala:37)
    	at org.apache.spark.ml.Pipeline.fit(Pipeline.scala:149)
    

    Can someone of the team can tell me what is going wrong ? thanks for your support

    opened by eleite77 0
  • Could not initialize class org.tensorframes.impl.SupportedOperations

    Could not initialize class org.tensorframes.impl.SupportedOperations

    Py4JJavaError: An error occurred while calling o162.analyze. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, 10.244.31.75, executor 1): java.lang.NoClassDefFoundError: Could not initialize class org.tensorframes.impl.SupportedOperations$ at org.tensorframes.ExtraOperations$.analyzeData(ExperimentalOperations.scala:148) at org.tensorframes.ExtraOperations$$anonfun$15.apply(ExperimentalOperations.scala:146) at org.tensorframes.ExtraOperations$$anonfun$15.apply(ExperimentalOperations.scala:146) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.tensorframes.ExtraOperations$.analyzeData(ExperimentalOperations.scala:146) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4$$anonfun$5.apply(ExperimentalOperations.scala:97) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4$$anonfun$5.apply(ExperimentalOperations.scala:97) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.Range.foreach(Range.scala:160) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4.apply(ExperimentalOperations.scala:97) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4.apply(ExperimentalOperations.scala:95) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185) at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceLeftOption(TraversableOnce.scala:203) at scala.collection.AbstractIterator.reduceLeftOption(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceOption(TraversableOnce.scala:210) at scala.collection.AbstractIterator.reduceOption(Iterator.scala:1334) at org.tensorframes.ExtraOperations$$anonfun$3.apply(ExperimentalOperations.scala:100) at org.tensorframes.ExtraOperations$$anonfun$3.apply(ExperimentalOperations.scala:93) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

    Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1874) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2108) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.collect(RDD.scala:944) at org.tensorframes.ExtraOperations$.deepAnalyzeDataFrame(ExperimentalOperations.scala:113) at org.tensorframes.ExperimentalOperations$class.analyze(ExperimentalOperations.scala:41) at org.tensorframes.impl.DebugRowOps.analyze(DebugRowOps.scala:281) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.tensorframes.impl.SupportedOperations$ at org.tensorframes.ExtraOperations$.analyzeData(ExperimentalOperations.scala:148) at org.tensorframes.ExtraOperations$$anonfun$15.apply(ExperimentalOperations.scala:146) at org.tensorframes.ExtraOperations$$anonfun$15.apply(ExperimentalOperations.scala:146) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.tensorframes.ExtraOperations$.analyzeData(ExperimentalOperations.scala:146) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4$$anonfun$5.apply(ExperimentalOperations.scala:97) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4$$anonfun$5.apply(ExperimentalOperations.scala:97) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.immutable.Range.foreach(Range.scala:160) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4.apply(ExperimentalOperations.scala:97) at org.tensorframes.ExtraOperations$$anonfun$3$$anonfun$4.apply(ExperimentalOperations.scala:95) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$class.foreach(Iterator.scala:891) at scala.collection.AbstractIterator.foreach(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185) at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceLeftOption(TraversableOnce.scala:203) at scala.collection.AbstractIterator.reduceLeftOption(Iterator.scala:1334) at scala.collection.TraversableOnce$class.reduceOption(TraversableOnce.scala:210) at scala.collection.AbstractIterator.reduceOption(Iterator.scala:1334) at org.tensorframes.ExtraOperations$$anonfun$3.apply(ExperimentalOperations.scala:100) at org.tensorframes.ExtraOperations$$anonfun$3.apply(ExperimentalOperations.scala:93) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more

    opened by lee2015new 2
Releases(v0.6.0)
  • v0.6.0(Nov 16, 2018)

  • v0.5.0(Aug 21, 2018)

  • v0.4.0(Jun 18, 2018)

  • v0.2.9(Sep 13, 2017)

    This is the final release for 0.2.9.

    Notable changes since 0.2.8:

    • Upgrades tensorflow dependency from version 1.1.0 to 1.3.0
    • map_blocks, map_row APIs now accept Pandas DataFrames as input
    • Adds support for tensorflow variables. Note that these variables cannot be shared between the worker nodes.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.8(Apr 25, 2017)

    This is the final release for 0.2.8.

    Notable changes since 0.2.5:

    • uses the official java API for tensorflow
    • support for image ingest (see inception example)
    • support for multiple hardware platforms (CPU, GPU) and operating systems (linux, macos). Windows should also work but it has not been tested.
    • support for Spark 2.1.x and Spark 2.2.x
    • some usability and performance fixes, which should give a better experience for users
    • more flexible input names for mapRows.
    Source code(tar.gz)
    Source code(zip)
  • v0.2.8-rc0(Apr 24, 2017)

    This is the first release candidate for 0.2.8.

    Notable changes:

    • uses the official java API for tensorflow
    • support for image ingest (see inception example)
    • support for Spark 2.1.x
    • the same release should support both CPU and GPU clusters
    • some usability and performance fixes, which should give a better experience for users
    Source code(tar.gz)
    Source code(zip)
Owner
Databricks
Helping data teams solve the world’s toughest problems using data and AI
Databricks
ZenML 🙏: MLOps framework to create reproducible ML pipelines for production machine learning.

ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. It has a simple, flexible syntax, is cloud and tool agnostic, and has interfaces/abstraction

ZenML 2.6k Jan 08, 2023
A python library for easy manipulation and forecasting of time series.

Time Series Made Easy in Python darts is a python library for easy manipulation and forecasting of time series. It contains a variety of models, from

Unit8 5.2k Jan 04, 2023
Magenta: Music and Art Generation with Machine Intelligence

Magenta is a research project exploring the role of machine learning in the process of creating art and music. Primarily this involves developing new

Magenta 18.1k Dec 30, 2022
Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc)

Built various Machine Learning algorithms (Logistic Regression, Random Forest, KNN, Gradient Boosting and XGBoost. etc). Structured a custom ensemble model and a neural network. Found a outperformed

Chris Yuan 1 Feb 06, 2022
This repo includes some graph-based CTR prediction models and other representative baselines.

Graph-based CTR prediction This is a repository designed for graph-based CTR prediction methods, it includes our graph-based CTR prediction methods: F

Big Data and Multi-modal Computing Group, CRIPAC 47 Dec 30, 2022
A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!

Stox A Module to predict the "close price" for the next day and give "technical analysis". It uses a Neural Network and the LSTM algorithm to predict

Stox 31 Dec 16, 2022
Predicting Baseball Metric Clusters: Clustering Application in Python Using scikit-learn

Clustering Clustering Application in Python Using scikit-learn This repository contains the prediction of baseball metric clusters using MLB Statcast

Tom Weichle 2 Apr 18, 2022
This machine-learning algorithm takes in data from the last 60 days and tries to predict tomorrow's price of any crypto you ask it.

Crypto-Currency-Predictor This machine-learning algorithm takes in data from the last 60 days and tries to predict tomorrow's price of any crypto you

Hazim Arafa 6 Dec 04, 2022
Factorization machines in python

Factorization Machines in Python This is a python implementation of Factorization Machines [1]. This uses stochastic gradient descent with adaptive re

Corey Lynch 892 Jan 03, 2023
Practical Time-Series Analysis, published by Packt

Practical Time-Series Analysis This is the code repository for Practical Time-Series Analysis, published by Packt. It contains all the supporting proj

Packt 325 Dec 23, 2022
To-Be is a machine learning challenge on CodaLab Platform about Mortality Prediction

To-Be is a machine learning challenge on CodaLab Platform about Mortality Prediction. The challenge aims to adress the problems of medical imbalanced data classification.

Marwan Mashra 1 Jan 31, 2022
A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and A* Search (Manhattan Distance Heuristic)

A Python-based application demonstrating various search algorithms, namely Depth-First Search (DFS), Breadth-First Search (BFS), and the A* Search (using the Manhattan Distance Heuristic)

17 Aug 14, 2022
SIMD-accelerated bitwise hamming distance Python module for hexidecimal strings

hexhamming What does it do? This module performs a fast bitwise hamming distance of two hexadecimal strings. This looks like: DEADBEEF = 1101111010101

Michael Recachinas 12 Oct 14, 2022
A Python toolbox to churn out organic alkalinity calculations with minimal brain engagement.

Organic Alkalinity Sausage Machine A Python toolbox to churn out organic alkalinity calculations with minimal brain engagement. Getting started To mak

Charles Turner 1 Feb 01, 2022
Scikit-Learn useful pre-defined Pipelines Hub

Scikit-Pipes Scikit-Learn useful pre-defined Pipelines Hub Usage: Install scikit-pipes It's advised to install sklearn-genetic using a virtual env, in

Rodrigo Arenas 1 Apr 26, 2022
An implementation of Relaxed Linear Adversarial Concept Erasure (RLACE)

Background This repository contains an implementation of Relaxed Linear Adversarial Concept Erasure (RLACE). Given a dataset X of dense representation

Shauli Ravfogel 4 Apr 13, 2022
A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching.

A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching. The solver will solve equations of the type: A can be

Sanjeet N. Dasharath 3 Feb 15, 2022
scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly.

scikit-fem is a lightweight Python 3.7+ library for performing finite element assembly. Its main purpose is the transformation of bilinear forms into sparse matrices and linear forms into vectors.

Tom Gustafsson 297 Dec 13, 2022
XGBoost + Optuna

AutoXGB XGBoost + Optuna: no brainer auto train xgboost directly from CSV files auto tune xgboost using optuna auto serve best xgboot model using fast

abhishek thakur 517 Dec 31, 2022
Primitives for machine learning and data science.

An Open Source Project from the Data to AI Lab, at MIT MLPrimitives Pipelines and primitives for machine learning and data science. Documentation: htt

MLBazaar 65 Dec 29, 2022