spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gen <>
Subject Re: Does filter on an RDD scan every data item ?
Date Tue, 02 Dec 2014 16:17:06 GMT

For your first question, I think that we can use

For your second question, I am not sure. But I don't think that we can
restricted filter within certain partition without scan every element.


nsareen wrote
> Hi ,
> I wanted some clarity into the functioning of Filter function of RDD.
> 1) Does filter function scan every element saved in RDD? if my RDD
> represents 10 Million rows, and if i want to work on only 1000 of them, is
> there an efficient way of filtering the subset without having to scan
> every element ?
> 2) If my RDD represents a Key / Value data set. When i filter this data
> set of 10 Million rows, can i specify that the search should be restricted
> to only partitions which contain specific keys ? Will spark run by filter
> operation on all partitions if the partitions are done by key,
> irrespective the key exists in a partition or not ?

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message