spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <>
Subject Re: Given multiple .filter()'s, is there a way to set the order?
Date Fri, 14 Nov 2014 19:02:05 GMT
In the situation you show, Spark will pipeline each filter together, and
will apply each filter one at a time to each row, effectively constructing
an "&&" statement. You would only see a performance difference if the
filter code itself is somewhat expensive, then you would want to only
execute it on a smaller set of rows. Otherwise, the runtime difference
between "a == b && b == c && c ==d" is minimal when compared to "a == b &
== c & c == d", the latter being sort of the worst-case scenario as it
would always run all filters (though as I said, Spark acts like the former).

Spark does not reorder the filters automatically. It uses the explicit
ordering you provide.

On Fri, Nov 14, 2014 at 10:20 AM, YaoPau <> wrote:

> I have an RDD "x" of millions of STRINGs, each of which I want to pass
> through a set of filters.  My filtering code looks like this:
> x.filter(filter#1, which will filter out 40% of data).
>    filter(filter#2, which will filter out 20% of data).
>    filter(filter#3, which will filter out 2% of data).
>    filter(filter#4, which will filter out 1% of data)
> There is no ordering requirement (filter #2 does not depend on filter #1,
> etc), but the filters are drastically different in the % of rows they
> should
> eliminate.  What I'd like is an ordering similar to a "||" statement, where
> if it fails on filter#1 the row automatically gets filtered out before the
> other three filters run.
> But when I play around with the ordering of the filters, the runtime
> doesn't
> seem to change.  Is Spark somehow intelligently guessing how effective each
> filter will be and ordering it correctly regardless of how I order them?
> If
> not, is there I way I can set the filter order?
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message