spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gen tang <>
Subject Re: Questions about SparkSQL join on not equality conditions
Date Mon, 10 Aug 2015 15:29:47 GMT

I am sorry to bother again.
When I do join as follow:
df = sqlContext.sql("selet a.someItem, b.someItem from a full outer join b
on condition1 *or* condition2")

The program failed at the result size is bigger than
It is really strange, as one record is no way bigger than 1G.
When I do join on just one condition or equity condition, there will be no

Could anyone help me, please?

Thanks a lot in advance.


On Sun, Aug 9, 2015 at 9:08 PM, gen tang <> wrote:

> Hi,
> I might have a stupid question about sparksql's implementation of join on
> not equality conditions, for instance condition1 or condition2.
> In fact, Hive doesn't support such join, as it is very difficult to
> express such conditions as a map/reduce job. However, sparksql supports
> such operation. So I would like to know how spark implement it.
> As I observe such join runs very slow, I guess that spark implement it by
> doing filter on the top of cartesian product. Is it true?
> Thanks in advance for your help.
> Cheers
> Gen

View raw message