spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: Weird results with Spark SQL Outer joins
Date Mon, 02 May 2016 18:26:51 GMT
Hi Kevin,

Thanks.

Please post the result of the same query with INNER JOIN and then it will
give us a bit of insight.

Regards,
Gourav


On Mon, May 2, 2016 at 7:10 PM, Kevin Peng <kpeng1@gmail.com> wrote:

> Gourav,
>
> Apologies.  I edited my post with this information:
> Spark version: 1.6
> Result from spark shell
> OS: Linux version 2.6.32-431.20.3.el6.x86_64 (
> mockbuild@c6b9.bsys.dev.centos.org) (gcc version 4.4.7 20120313 (Red Hat
> 4.4.7-4) (GCC) ) #1 SMP Thu Jun 19 21:14:45 UTC 2014
>
> Thanks,
>
> KP
>
> On Mon, May 2, 2016 at 11:05 AM, Gourav Sengupta <
> gourav.sengupta@gmail.com> wrote:
>
>> Hi,
>>
>> As always, can you please write down details regarding your SPARK cluster
>> - the version, OS, IDE used, etc?
>>
>> Regards,
>> Gourav Sengupta
>>
>> On Mon, May 2, 2016 at 5:58 PM, kpeng1 <kpeng1@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I am running into a weird result with Spark SQL Outer joins.  The results
>>> for all of them seem to be the same, which does not make sense due to the
>>> data.  Here are the queries that I am running with the results:
>>>
>>> sqlContext.sql("SELECT s.date AS edate  , s.account AS s_acc  ,
>>> d.account AS
>>> d_acc  , s.ad as s_ad  , d.ad as d_ad , s.spend AS s_spend  ,
>>> d.spend_in_dollar AS d_spend FROM swig_pin_promo_lt s FULL OUTER JOIN
>>> dps_pin_promo_lt d  ON (s.date = d.date AND s.account = d.account AND
>>> s.ad =
>>> d.ad) WHERE s.date >= '2016-01-03'    AND d.date >=
>>> '2016-01-03'").count()
>>> RESULT:23747
>>>
>>>
>>> sqlContext.sql("SELECT s.date AS edate  , s.account AS s_acc  ,
>>> d.account AS
>>> d_acc  , s.ad as s_ad  , d.ad as d_ad , s.spend AS s_spend  ,
>>> d.spend_in_dollar AS d_spend FROM swig_pin_promo_lt s LEFT OUTER JOIN
>>> dps_pin_promo_lt d  ON (s.date = d.date AND s.account = d.account AND
>>> s.ad =
>>> d.ad) WHERE s.date >= '2016-01-03'    AND d.date >=
>>> '2016-01-03'").count()
>>> RESULT:23747
>>>
>>> sqlContext.sql("SELECT s.date AS edate  , s.account AS s_acc  ,
>>> d.account AS
>>> d_acc  , s.ad as s_ad  , d.ad as d_ad , s.spend AS s_spend  ,
>>> d.spend_in_dollar AS d_spend FROM swig_pin_promo_lt s RIGHT OUTER JOIN
>>> dps_pin_promo_lt d  ON (s.date = d.date AND s.account = d.account AND
>>> s.ad =
>>> d.ad) WHERE s.date >= '2016-01-03'    AND d.date >=
>>> '2016-01-03'").count()
>>> RESULT: 23747
>>>
>>> Was wondering if someone had encountered this issues before.
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Weird-results-with-Spark-SQL-Outer-joins-tp26861.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>

Mime
View raw message