spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Pfeiffer <...@preferred.jp>
Subject Re: quickly counting the number of rows in a partition?
Date Wed, 14 Jan 2015 01:07:37 GMT
Hi again,

On Wed, Jan 14, 2015 at 10:06 AM, Tobias Pfeiffer <tgp@preferred.jp> wrote:

> If you think of
>     items.map(x => /* throw exception */).count()
> then even though the count you want to get does not necessarily require
> the evaluation of the function in map() (i.e., the number is the same), you
> may not want to get the count if that code actually fails.
>

Sorry, I think that was a bit confusing. What I mean is: You have to
compute the whole RDD in order to give a meaningful count() result (whether
you use rdd.count() or the mapPartitions() approach).

Tobias

Mime
View raw message