spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <>
Subject Re: How to filter a sorted RDD
Date Mon, 04 Nov 2013 07:24:57 GMT
You could short-circuit the filtering within the interator function
supplied to mapPartitions.

On Sunday, November 3, 2013, Xiang Huo wrote:

> Hi all,
> I am trying to filter a smaller RDD data set from a large RDD data set.
> And the large one is sorted. So my question is that is there any way to
> make the filter method does't check every element in RDD but filter out all
> the other elements when one element doesn't meet the condition of filter.
> Because the large data set is sorted, when there is one element doesn't
> meet the requirement, all the following elements are impossible to meet.
> But checking them one by one will take a relative long time.
> So is there any way to save time for this part?
> Thanks,
> Xiang
> --
> Xiang Huo
> Department of Computer Science
> University of Illinois at Chicago(UIC)
> Chicago, Illinois
> US
> Email: <javascript:_e({}, 'cvml',
> '');>
>            or <javascript:_e({}, 'cvml', '');>

View raw message