spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheng Lian (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-9182) filter and groupBy on DataFrames are not passed through to jdbc source
Date Thu, 13 Aug 2015 05:46:48 GMT

    [ https://issues.apache.org/jira/browse/SPARK-9182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694739#comment-14694739
] 

Cheng Lian commented on SPARK-9182:
-----------------------------------

[~grahn] Unfortunately we found a regression in the previous fix and have to revert it. Before
a proper fix is delivered, this issue can be worked around by explicit casting over the literal
values in the filter. Namely, using
{noformat}
emp.filter("sal > cast(2500 as decimal(7, 2))")
{noformat}
instead of
{noformat}
emp.filter("sal > 2500")
{noformat}


> filter and groupBy on DataFrames are not passed through to jdbc source
> ----------------------------------------------------------------------
>
>                 Key: SPARK-9182
>                 URL: https://issues.apache.org/jira/browse/SPARK-9182
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.1
>            Reporter: Greg Rahn
>            Assignee: Yijie Shen
>            Priority: Critical
>             Fix For: 1.5.0
>
>
> When running all of these API calls, the only one that passes the filter through to the
backend jdbc source is equality.  All filters in these commands should be able to be passed
through to the jdbc database source.
> {code}
> val url="jdbc:postgresql:grahn"
> val prop = new java.util.Properties
> val emp = sqlContext.read.jdbc(url, "emp", prop)
> emp.filter(emp("sal") === 5000).show()
> emp.filter(emp("sal") < 5000).show()
> emp.filter("sal = 3000").show()
> emp.filter("sal > 2500").show()
> emp.filter("sal >= 2500").show()
> emp.filter("sal < 2500").show()
> emp.filter("sal <= 2500").show()
> emp.filter("sal != 3000").show()
> emp.filter("sal between 3000 and 5000").show()
> emp.filter("ename in ('SCOTT','BLAKE')").show()
> {code}
> We see from the PostgreSQL query log the following is run, and see that only equality
predicates are passed through.
> {code}
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp WHERE sal = 5000
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp WHERE sal = 3000
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp
> LOG:  execute <unnamed>: SET extra_float_digits = 3
> LOG:  execute <unnamed>: SELECT "empno","ename","job","mgr","hiredate","sal","comm","deptno"
FROM emp
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message