spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Cheapest way to materialize an RDD?
Date Fri, 30 Jan 2015 22:42:38 GMT
So far, the canonical way to materialize an RDD just to make sure it's
cached is to call count(). That's fine but incurs the overhead of
actually counting the elements.

However, rdd.foreachPartition(p => None) for example also seems to
cause the RDD to be materialized, and is a no-op. Is that a better way
to do it or am I not thinking of why it's insufficient?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message