spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Erlandson (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-21277) Spark is invoking an incorrect serializer after UDAF completion
Date Sat, 01 Jul 2017 16:55:00 GMT
Erik Erlandson created SPARK-21277:
--------------------------------------

             Summary: Spark is invoking an incorrect serializer after UDAF completion
                 Key: SPARK-21277
                 URL: https://issues.apache.org/jira/browse/SPARK-21277
             Project: Spark
          Issue Type: Bug
          Components: Optimizer, SQL
    Affects Versions: 2.1.0
            Reporter: Erik Erlandson


I'm writing a UDAF that also requires some custom UDT implementations.  The UDAF (and UDT)
logic appear to be executing properly up through the final UDAF call to the {{evaluate}} method.
However, after the evaluate method completes, I am seeing the UDT {{deserialize}} method being
called another time, however this time it is being invoked on data that wasn't produced by
my corresponding {{serialize}} method, and it is crashing.  The following REPL output shows
the execution and completion of {{evaluate}}, and then another call to {{deserialize}} that
sees some kind of {{UnsafeArrayData}} object that my serialization doesn't produce, and so
the method fails:

{code}entering evaluate
a= [[0.5,10,2,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1813f2c,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@b3587fc7],[0.5,10,4,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d3065487,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1a5ace9],[0.5,10,4,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d01fbbcf,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1a5ace9]]
leaving evaluate
a= org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@27d73513
java.lang.RuntimeException: Error while decoding: java.lang.UnsupportedOperationException:
Not supported on UnsafeArrayData.
createexternalrow(newInstance(class org.apache.spark.isarnproject.sketches.udt.TDigestArrayUDT).deserialize,
StructField(tdigestmlvecudaf(features),TDigestArrayUDT,true))
{code}

To reproduce, check out the branch {{first-cut}} of {{isarn-sketches-spark}}:
https://github.com/erikerlandson/isarn-sketches-spark/tree/first-cut

Then invoke {{xsbt console}} to get a REPL with a spark session.  In the REPL execute:
{code}
Welcome to Scala 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131).
Type in expressions for evaluation. Or try :help.

scala> val training = spark.createDataFrame(Seq((1.0, Vectors.dense(0.0, 1.1, 0.1)),(0.0,
Vectors.dense(2.0, 1.0, -1.0)),(0.0, Vectors.dense(2.0, 1.3, 1.0)),(1.0, Vectors.dense(0.0,
1.2, -0.5)))).toDF("label", "features")
training: org.apache.spark.sql.DataFrame = [label: double, features: vector]

scala> val featTD = training.agg(TDigestMLVecUDAF(0.5,10)(training("features")))
featTD: org.apache.spark.sql.DataFrame = [tdigestmlvecudaf(features): tdigestarray]

scala> featTD.first
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message