spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <m...@clearstorydata.com>
Subject Re: How to filter a sorted RDD
Date Mon, 04 Nov 2013 07:24:57 GMT
You could short-circuit the filtering within the interator function
supplied to mapPartitions.


On Sunday, November 3, 2013, Xiang Huo wrote:

> Hi all,
>
> I am trying to filter a smaller RDD data set from a large RDD data set.
> And the large one is sorted. So my question is that is there any way to
> make the filter method does't check every element in RDD but filter out all
> the other elements when one element doesn't meet the condition of filter.
> Because the large data set is sorted, when there is one element doesn't
> meet the requirement, all the following elements are impossible to meet.
> But checking them one by one will take a relative long time.
> So is there any way to save time for this part?
>
> Thanks,
>
> Xiang
>
> --
> Xiang Huo
> Department of Computer Science
> University of Illinois at Chicago(UIC)
> Chicago, Illinois
> US
> Email: huoxiang5659@gmail.com <javascript:_e({}, 'cvml',
> 'huoxiang5659@gmail.com');>
>            or xhuo4@uic.edu <javascript:_e({}, 'cvml', 'xhuo4@uic.edu');>
>

Mime
View raw message