spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lukas Rytz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-25044) Address translation of LMF closure primitive args to Object in Scala 2.12
Date Wed, 08 Aug 2018 06:58:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572775#comment-16572775
] 

Lukas Rytz commented on SPARK-25044:
------------------------------------

The encoding is as expected. To expand on a few details
 * The function
{code:java}
(x: Int, y: Int) => ""{code}
is not specialized, neither in 2.11 nor 2.12. Function2 doesn't have a specialized variant
for Int*Int*AnyRef. So this creates an instance of Function2, not Function2$sp$XXX, and the
argumetns are boxed when invoking the method.
 * The 2.11 encoding always generates the apply method with the types as they appear in source
code, and then generates a bridge method if necessary. So the above will generate an apply(II)LString;
with the implementation, and a bridge apply(LObject;LObject;)LObject; that unboxes and delegates
to the implementation. Callsites will always box and invoke the bridge method.
 * The 2.12 encoding generates an  $anonfun$foo$1(II)LString; method in the enclosing class
with the lambda body. In addition, it creates an $anonfun$foo$1$adapted(LObject;LObject;)LString;
method that unboxes and invokes the body method. The adapted method is used for the LMF. The
SAM interface passed to the LMF is Function2, whose abstract method is apply(LObject;LObject)LObject;
 * You're right that LMF can do boxing adaptations internally, so we could pass the $anonfun$foo$1
method to LMF (instead of the $adapted). However, the boxing semantics are not exactly those
that we need for Scala. In particular, unboxing null gives 0 in Scala, but NPE in java. That's
why we emit and use the $adapted method.

 

On the other hand:
 * The function
{code:java}
(x: Int, y: Int) => x + y{code}
is specialized.
 * In 2.11, the closure class extends Function2$mcIII$sp
 * 2.12 creates a $anonfun$foo$2(II)I method in the enclosing class. This method is used for
the LMF, the SAM interface is Lscala/runtime/java8/JFunction2$mcIII$sp. The signature of the
abstract method in that interface matches exactly.

 

I don't know about what the SQL implementation does internally, but maybe the above gives
enough information to understand the problem? Let me know if I can help.

> Address translation of LMF closure primitive args to Object in Scala 2.12
> -------------------------------------------------------------------------
>
>                 Key: SPARK-25044
>                 URL: https://issues.apache.org/jira/browse/SPARK-25044
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core, SQL
>    Affects Versions: 2.4.0
>            Reporter: Sean Owen
>            Priority: Major
>
> A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 Fix HandleNullInputsForUDF
rule":
> {code:java}
> - SPARK-24891 Fix HandleNullInputsForUDF rule *** FAILED ***
> Results do not match for query:
> ...
> == Results ==
> == Results ==
> !== Correct Answer - 3 == == Spark Answer - 3 ==
> !struct<> struct<a:bigint,b:int,c:int>
> ![0,10,null] [0,10,0]
> ![1,12,null] [1,12,1]
> ![2,14,null] [2,14,2] (QueryTest.scala:163){code}
> You can kind of get what's going on reading the test:
> {code:java}
> test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
> // assume(!ClosureCleanerSuite2.supportsLMFs)
> // This test won't test what it intends to in 2.12, as lambda metafactory closures
> // have arg types that are not primitive, but Object
> val udf1 = udf({(x: Int, y: Int) => x + y})
> val df = spark.range(0, 3).toDF("a")
> .withColumn("b", udf1($"a", udf1($"a", lit(10))))
> .withColumn("c", udf1($"a", lit(null)))
> val plan = spark.sessionState.executePlan(df.logicalPlan).analyzed
> comparePlans(df.logicalPlan, plan)
> checkAnswer(
> df,
> Seq(
> Row(0, 10, null),
> Row(1, 12, null),
> Row(2, 14, null)))
> }{code}
>  
> It seems that the closure that is fed in as a UDF changes behavior, in a way that primitive-type
arguments are handled differently. For example an Int argument, when fed 'null', acts like
0.
> I'm sure it's a difference in the LMF closure and how its types are understood, but not
exactly sure of the cause yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message