spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Pfeiffer <>
Subject Re: quickly counting the number of rows in a partition?
Date Wed, 14 Jan 2015 01:07:37 GMT
Hi again,

On Wed, Jan 14, 2015 at 10:06 AM, Tobias Pfeiffer <> wrote:

> If you think of
> => /* throw exception */).count()
> then even though the count you want to get does not necessarily require
> the evaluation of the function in map() (i.e., the number is the same), you
> may not want to get the count if that code actually fails.

Sorry, I think that was a bit confusing. What I mean is: You have to
compute the whole RDD in order to give a meaningful count() result (whether
you use rdd.count() or the mapPartitions() approach).


View raw message