spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Rosen <rosenvi...@gmail.com>
Subject Re: GroupingComparator in Spark.
Date Wed, 04 Dec 2013 21:47:19 GMT
It looks like OrderedRDDFunctions (
https://spark.incubator.apache.org/docs/latest/api/core/index.html#org.apache.spark.rdd.OrderedRDDFunctions),
which defines sortBy(), is constructed with an implicit Ordered[K], so you
could explicitly construct an OrderedRDDFunctions with your own Ordered.
 You might also be able to define an implicit Ordered[K] that takes
precedence over the default ordering in the scope where you call sortBy().


On Wed, Dec 4, 2013 at 1:09 PM, Reynold Xin <rxin@apache.org> wrote:

> Spark's expressiveness allows you to do this fairly easily on your own.
>
> sortByKey is implemented in a few lines of code. It would be fairly easy
> to implement your own sortByKey to do that. Replace the partitioner in
> sortByKey with a hash partitioner on the key, and then add define a
> separate way to sort on each partition after the hash partitioning.
>
>
> On Wed, Dec 4, 2013 at 10:58 AM, Archit Thakur <archit279thakur@gmail.com>wrote:
>
>>
>>
>> Hi,
>>
>> Was just curious. In Hadoop, You have a flexibilty that you can chose
>> your class for SortComparator and GroupingComparator. I have figured out
>> that there are functions like sortByKey and reduceByKey.
>> But what if, I want to customize what part of key I want to use for
>> sorting and which part for grouping(that which all records should go to
>> single reducer corresponding to same key.)? Is there any way that could be
>> achieved, wherein we can specify our own SortComparator and
>> GroupingComparator.
>>
>> Thanks and Regards,
>> Archit Thakur.
>>
>>
>

Mime
View raw message