spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghavendra Pandey <raghavendra.pan...@gmail.com>
Subject Re: Will multiple filters on the same RDD optimized to one filter?
Date Thu, 16 Jul 2015 08:08:09 GMT
If you cache rdd it will save some operations. But anyway filter is a lazy
operation. And it runs based on what you will do later on with rdd1 and
rdd2...

Raghavendra
On Jul 16, 2015 1:33 PM, "Bin Wang" <wbin00@gmail.com> wrote:

> If I write code like this:
>
> val rdd = input.map(_.value)
> val f1 = rdd.filter(_ == 1)
> val f2 = rdd.filter(_ == 2)
> ...
>
> Then the DAG of the execution may be this:
>
>          -> Filter -> ...
> Map
>          -> Filter -> ...
>
> But the two filters is operated on the same RDD, which means it could be
> done by just scan the RDD once. Does spark have this kind optimization for
> now?
>

Mime
View raw message