spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liang-Chi Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-21277) Spark is invoking an incorrect serializer after UDAF completion
Date Mon, 03 Jul 2017 05:18:02 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071929#comment-16071929
] 

Liang-Chi Hsieh commented on SPARK-21277:
-----------------------------------------

The call to {{InternalRow.getArray}} returns an {{ArrayData}}, it can be an {{UnsafeArrayData}}.
Although you don't serialize your object data to {{UnsafeArrayData}}, the SparkSQL internally
uses {{UnsafeArrayData}} for array.

We can close this if you have no further question.

> Spark is invoking an incorrect serializer after UDAF completion
> ---------------------------------------------------------------
>
>                 Key: SPARK-21277
>                 URL: https://issues.apache.org/jira/browse/SPARK-21277
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer, SQL
>    Affects Versions: 2.1.0
>            Reporter: Erik Erlandson
>
> I'm writing a UDAF that also requires some custom UDT implementations.  The UDAF (and
UDT) logic appear to be executing properly up through the final UDAF call to the {{evaluate}}
method. However, after the evaluate method completes, I am seeing the UDT {{deserialize}}
method being called another time, however this time it is being invoked on data that wasn't
produced by my corresponding {{serialize}} method, and it is crashing.  The following REPL
output shows the execution and completion of {{evaluate}}, and then another call to {{deserialize}}
that sees some kind of {{UnsafeArrayData}} object that my serialization doesn't produce, and
so the method fails:
> {code}entering evaluate
> a= [[0.5,10,2,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1813f2c,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@b3587fc7],[0.5,10,4,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d3065487,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1a5ace9],[0.5,10,4,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d01fbbcf,org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@f1a5ace9]]
> leaving evaluate
> a= org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@27d73513
> java.lang.RuntimeException: Error while decoding: java.lang.UnsupportedOperationException:
Not supported on UnsafeArrayData.
> createexternalrow(newInstance(class org.apache.spark.isarnproject.sketches.udt.TDigestArrayUDT).deserialize,
StructField(tdigestmlvecudaf(features),TDigestArrayUDT,true))
> {code}
> To reproduce, check out the branch {{first-cut}} of {{isarn-sketches-spark}}:
> https://github.com/erikerlandson/isarn-sketches-spark/tree/first-cut
> Then invoke {{xsbt console}} to get a REPL with a spark session.  In the REPL execute:
> {code}
> Welcome to Scala 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131).
> Type in expressions for evaluation. Or try :help.
> scala> val training = spark.createDataFrame(Seq((1.0, Vectors.dense(0.0, 1.1, 0.1)),(0.0,
Vectors.dense(2.0, 1.0, -1.0)),(0.0, Vectors.dense(2.0, 1.3, 1.0)),(1.0, Vectors.dense(0.0,
1.2, -0.5)))).toDF("label", "features")
> training: org.apache.spark.sql.DataFrame = [label: double, features: vector]
> scala> val featTD = training.agg(TDigestMLVecUDAF(0.5,10)(training("features")))
> featTD: org.apache.spark.sql.DataFrame = [tdigestmlvecudaf(features): tdigestarray]
> scala> featTD.first
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message