spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mendelson, Assaf" <Assaf.Mendel...@rsa.com>
Subject RE: How does predicate push down really help?
Date Thu, 17 Nov 2016 07:34:31 GMT
Actually, both you translate to the same plan.
When you do sql(“some code”) or filter, it doesn’t actually do the query. Instead it
is translated to a plan (parsed plan) which transform everything into standard spark expressions.
Then spark analyzes it to fill in the blanks (what is users table for example) and attempts
to optimize it. Predicate pushdown happens in the optimization portion.
For example, let’s say that users would actually be backed by a table on an sql query in
mysql.
Without predicate pushdown spark would first pull the entire users table from mysql and only
then do the filtering. Predicate pushdown would mean the filtering would be done as part of
the original sql query.

Another (probably better) example would be something like having two table A and B which are
joined by some common key. Then a filtering is done on the key. Moving the filter to be before
the join would probably make everything faster as filter is a faster operation than a join.

Assaf.

From: kant kodali [mailto:kanth909@gmail.com]
Sent: Thursday, November 17, 2016 8:03 AM
To: user @spark
Subject: How does predicate push down really help?

How does predicate push down really help? in the following cases

val df1 = spark.sql("select * from users where age > 30")

                             vs

val df1 = spark.sql("select * from users")
df.filter("age > 30")


Mime
View raw message