spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Jin <>
Subject Optimizer rule ConvertToLocalRelation causes expressions to be eager-evaluated in Planning phase
Date Fri, 08 Jun 2018 19:34:58 GMT
Hi All,

Sorry for the long email title. I am a bit surprised to find that the
current optimizer rule "ConvertToLocalRelation" causes expressions to be
eager-evaluated in planning phase, this can be demonstrated with the
following code:

scala> val myUDF = udf((x: String) => { println("UDF evaled"); "result" })

myUDF: org.apache.spark.sql.expressions.UserDefinedFunction =

scala> val df = Seq(("test", 1)).toDF("s", "i").select(myUDF($"s"))

df: org.apache.spark.sql.DataFrame = [UDF(s): string]

scala> println(df.queryExecution.optimizedPlan)

UDF evaled

LocalRelation [UDF(s)#9]

 This is somewhat unexpected to me because of Spark's lazy execution model.

I am wondering if this behavior is by design?


View raw message