spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@apache.org>
Subject Re: GroupingComparator in Spark.
Date Wed, 04 Dec 2013 21:09:34 GMT
Spark's expressiveness allows you to do this fairly easily on your own.

sortByKey is implemented in a few lines of code. It would be fairly easy to
implement your own sortByKey to do that. Replace the partitioner in
sortByKey with a hash partitioner on the key, and then add define a
separate way to sort on each partition after the hash partitioning.


On Wed, Dec 4, 2013 at 10:58 AM, Archit Thakur <archit279thakur@gmail.com>wrote:

>
>
> Hi,
>
> Was just curious. In Hadoop, You have a flexibilty that you can chose your
> class for SortComparator and GroupingComparator. I have figured out that
> there are functions like sortByKey and reduceByKey.
> But what if, I want to customize what part of key I want to use for
> sorting and which part for grouping(that which all records should go to
> single reducer corresponding to same key.)? Is there any way that could be
> achieved, wherein we can specify our own SortComparator and
> GroupingComparator.
>
> Thanks and Regards,
> Archit Thakur.
>
>

Mime
View raw message