spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mayur bhole <mayur.bhol...@gmail.com>
Subject How Spark sql query optimisation work if we are using .rdd action ?
Date Sun, 14 Aug 2016 05:04:10 GMT
HI All,

Lets say, we have

val df = bigTableA.join(bigTableB,bigTableA("A")===bigTableB("A"),"left")
val rddFromDF = df.rdd
println(rddFromDF.count)

My understanding is that spark will convert all data frame operations
before "rddFromDF.count" into RDD equivalent operation as we are not
performing any action on dataframe directly. In that case, spark will not
be using optimization engine. Is my assumption right? Please point me to
right resources.

[ Note : I have posted same question on so :
http://stackoverflow.com/questions/38889812/how-spark-dataframe-optimization-engine-works-with-dag
]

Thanks

Mime
View raw message