spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Maas <gerard.m...@gmail.com>
Subject Re: Selecting first ten values in a RDD/partition
Date Thu, 29 May 2014 20:18:35 GMT
DStream has a help method to print the first 10 elements of each RDD. You
could take some inspiration from it, as the usecase is practically the same
and the code will be probably very similar:  rdd.take(10)...

https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala#L591

-kr, Gerard.




On Thu, May 29, 2014 at 10:08 PM, Brian Gawalt <bgawalt@gmail.com> wrote:

> Try looking at the .mapPartitions( ) method implemented for RDD[T] objects.
> It will give you direct access to an iterator containing the member objects
> of each partition for doing the kind of within-partition hashtag counts
> you're describing.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Selecting-first-ten-values-in-a-RDD-partition-tp6517p6534.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message