spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Erlandson (JIRA)" <>
Subject [jira] [Created] (SPARK-21277) Spark is invoking an incorrect serializer after UDAF completion
Date Sat, 01 Jul 2017 16:55:00 GMT
Erik Erlandson created SPARK-21277:

             Summary: Spark is invoking an incorrect serializer after UDAF completion
                 Key: SPARK-21277
             Project: Spark
          Issue Type: Bug
          Components: Optimizer, SQL
    Affects Versions: 2.1.0
            Reporter: Erik Erlandson

I'm writing a UDAF that also requires some custom UDT implementations.  The UDAF (and UDT)
logic appear to be executing properly up through the final UDAF call to the {{evaluate}} method.
However, after the evaluate method completes, I am seeing the UDT {{deserialize}} method being
called another time, however this time it is being invoked on data that wasn't produced by
my corresponding {{serialize}} method, and it is crashing.  The following REPL output shows
the execution and completion of {{evaluate}}, and then another call to {{deserialize}} that
sees some kind of {{UnsafeArrayData}} object that my serialization doesn't produce, and so
the method fails:

{code}entering evaluate
a= [[0.5,10,2,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1813f2c,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@b3587fc7],[0.5,10,4,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d3065487,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1a5ace9],[0.5,10,4,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d01fbbcf,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1a5ace9]]
leaving evaluate
a= org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@27d73513
java.lang.RuntimeException: Error while decoding: java.lang.UnsupportedOperationException:
Not supported on UnsafeArrayData.
createexternalrow(newInstance(class org.apache.spark.isarnproject.sketches.udt.TDigestArrayUDT).deserialize,

To reproduce, check out the branch {{first-cut}} of {{isarn-sketches-spark}}:

Then invoke {{xsbt console}} to get a REPL with a spark session.  In the REPL execute:
Welcome to Scala 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131).
Type in expressions for evaluation. Or try :help.

scala> val training = spark.createDataFrame(Seq((1.0, Vectors.dense(0.0, 1.1, 0.1)),(0.0,
Vectors.dense(2.0, 1.0, -1.0)),(0.0, Vectors.dense(2.0, 1.3, 1.0)),(1.0, Vectors.dense(0.0,
1.2, -0.5)))).toDF("label", "features")
training: org.apache.spark.sql.DataFrame = [label: double, features: vector]

scala> val featTD = training.agg(TDigestMLVecUDAF(0.5,10)(training("features")))
featTD: org.apache.spark.sql.DataFrame = [tdigestmlvecudaf(features): tdigestarray]

scala> featTD.first

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message