spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomasz Gawęda <>
Subject Preventing predicate pushdown
Date Tue, 15 May 2018 12:33:09 GMT

while working with JDBC datasource I saw that many "or" clauses with 
non-equality operators causes huge performance degradation of SQL query 
to database (DB2). For example:

val df ="jdbc").(other options to parallelize 

df.where(s"(date1 > $param1 and (date1 < $param2 or date1 is null) or x 
 > 100)").show() // in real application whose predicates were pushed 
many many lines below, many ANDs and ORs

If I use cache() before where, there is no predicate pushdown of this 
"where" clause. However, in production system caching many sources is a 
waste of memory (especially is pipeline is long and I must do cache many 

I asked on StackOverflow for better ideas:

However, there are only workarounds. I can use those workarounds, but 
maybe it would be better to add such functionality directly in the API?

For example: df.withAnalysisBarrier().where(...) ?

Please let me know if I should create a JIRA or it's not a good idea for 
some reasons.

Pozdrawiam / Best regards,

Tomek Gawęda

View raw message