spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Tanase <atan...@adobe.com>
Subject Re: Secondary Sorting in Spark
Date Mon, 26 Oct 2015 09:51:02 GMT
Do you have a particular concern? You’re always using a partitioner (default is HashPartitioner)
and the Partitioner interface is pretty light, can’t see how it could affect performance.

Used correctly it should improve performance as you can better control placement of data and
avoid shuffling…

-adrian

From: swetha kasireddy
Date: Monday, October 26, 2015 at 6:56 AM
To: Adrian Tanase
Cc: Bill Bejeck, "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: Re: Secondary Sorting in Spark

Hi,

Does the use of custom partitioner in Streaming affect performance?

On Mon, Oct 5, 2015 at 1:06 PM, Adrian Tanase <atanase@adobe.com<mailto:atanase@adobe.com>>
wrote:
Great article, especially the use of a custom partitioner.

Also, sorting by multiple fields by creating a tuple out of them is an awesome, easy to miss,
Scala feature.

Sent from my iPhone

On 04 Oct 2015, at 21:41, Bill Bejeck <bbejeck@gmail.com<mailto:bbejeck@gmail.com>>
wrote:

I've written blog post on secondary sorting in Spark and I'd thought I'd share it with the
group

http://codingjunkie.net/spark-secondary-sort/

Thanks,
Bill

Mime
View raw message