spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fernando O." <fot...@gmail.com>
Subject Re: limit vs sample for indexing a small amount of data quickly?
Date Thu, 01 Jan 2015 05:36:33 GMT
There's a take method that might do what you need:

*def take(**num**: **Int**): Array[T]*

Take the first num elements of the RDD.
On Jan 1, 2015 12:02 AM, "Kevin Burton" <burton@spinn3r.com> wrote:

> Is there a limit function which just returns the first N records?
>
> Sample is nice but I’m trying to do this so it’s super fast and just to
> test the functionality of an algorithm.
>
> With sample I’d have to compute the % that would yield 1000 results first…
>
> Kevin
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>

Mime
View raw message