spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaojin Wang <Xiaojin.W...@microsoft.com.INVALID>
Subject [Spark JDBC] Spark jdbc on SQL server failed with filter and disable pushDownPredicate not work
Date Wed, 01 Sep 2021 06:21:48 GMT
Hi guys,

I recently met with an error with 'JDBC_PUSHDOWN_PREDICATE' option not work.  The background
is

val url = "jdbc:sqlserver://XXXXXX"
val properties = new Properties
val df = spark.read.jdbc(url, "movies", properties)
df.filter("rated == true").show()

I am using this code to read from SQL server do transformations with filter. However this
way I met with an expcetion:
Job aborted due to stage failure. Caused by: SQLServerException: Invalid column name 'true'.

The original table contains a 'bit' data type 'rated'. Digging into the code, I found 'bit'
will be translate to Boolean type. Following the pushdown logic, in MSSqlserverDialect compileValue()
method, the Boolean value is translated to 'true'/'false' which doesn't match TSQL language
'1'/'0'. And finally caused this issue.

After figuring out the issue, I tried to use 'pushDownPredicate' options to avoid pushing
down the filter logic into SQL query, the code is like

val url = "jdbc:sqlserver://XXXXXX"
val properties = new Properties
properties.setProperty(JDBCOptions.JDBC_PUSHDOWN_PREDICATE, "false") add but still not work
val df = spark.read.jdbc(url, "movies", properties)
df.filter("rated == true").show()

However it still failed with the same error message. Seems the pushdown false is not working
at all. So the question is why the pushdownPredicate option is not work as expected and if
there is other mitigations to fix this issue.


Best,
Xiaojin


Mime
View raw message