spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: almost sorted data
Date Fri, 25 Oct 2013 14:17:02 GMT
Using a local sort per partition only gives a correct result if the data
is already range partitioned.

On 25.10.2013 16:11, Nathan Kronenfeld wrote:
> Since no one else has answered...
> I assume:
> 
>     data.mapPartitions(_.toList.sortBy(...).toIterator)
> 
> would work, but I also suspect there's a better way.
> 
> 
> On Fri, Oct 25, 2013 at 5:01 AM, Arun Kumar <arunpatala@gmail.com> wrote:
> 
>> Hi,
>>
>> I am trying to process some logs and the data is sorted(*almost*) by
>> timestamp.
>> If I do a full sort it takes a lot of time. Is there some way to sort more
>> efficiently (like restricting sort to per partition).
>>
>> Thanks in advance
>>
> 
> 
> 


Mime
View raw message