spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Semenov <vadim.seme...@datadoghq.com>
Subject Re: Multiple filters vs multiple conditions
Date Tue, 03 Oct 2017 16:05:18 GMT
Since you're using Dataset API or RDD API, they won't be fused together by
the Catalyst optimizer unless you use the DF API.
Two filters will get executed within one stage, and there'll be very small
overhead on having two separate filters vs having only one.

On Tue, Oct 3, 2017 at 8:14 AM, Ahmed Mahmoud <don1559@gmail.com> wrote:

> Hi All,
>
> Just a quick question from an optimisation point of view:
>
> Approach 1:
> .filter (t-> t.x=1 && t.y=2)
>
> Approach 2:
> .filter (t-> t.x=1)
> .filter (t-> t.y=2)
>
> Is there a difference or one is better than the other  or both are same?
>
> Thanks!
> Ahmed Mahmoud
>
>

Mime
View raw message