spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Понькин Алексей <alexey.pon...@ya.ru>
Subject Re: Spark Number of Partitions Recommendations
Date Sun, 02 Aug 2015 06:06:36 GMT
Yes, I forgot to mention
I chose prime number as a modulo for hash function because my keys are usually 
strings and spark calculates particular partitiion using key hash(see HashPartitioner.scala)
So, to avoid big number of collisions(when many keys located in few partition) it is common
to use prime number in modulo. But it makes sense only for String keys offcourse, because
of hash function. If yuo have different hash function for key of different type you can use
any other modulo instead prime number.
I like this discussion on this topic http://stackoverflow.com/questions/1145217/why-should-hash-functions-use-a-prime-number-modulus


-- 
Яндекс.Почта — надёжная почта
http://mail.yandex.ru/neo2/collect/?exp=1&t=1


02.08.2015, 00:14, "Ruslan Dautkhanov" <dautkhanov@gmail.com>:
> You should also take into account amount of memory that you plan to use.
> It's advised not to give too much memory for each executor .. otherwise GC overhead will
go up.
>
> Btw, why prime numbers?
>
> --
> Ruslan Dautkhanov
>
> On Wed, Jul 29, 2015 at 3:31 AM, ponkin <alexey.ponkin@ya.ru> wrote:
>> Hi Rahul,
>>
>> Where did you see such a recommendation?
>> I personally define partitions with the following formula
>>
>> partitions = nextPrimeNumberAbove( K*(--num-executors * --executor-cores ) )
>>
>> where
>> nextPrimeNumberAbove(x) - prime number which is greater than x
>> K - multiplicator  to calculate start with 1 and encrease untill join
>> perfomance start to degrade
>>
>> --
>> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Number-of-Partitions-Recommendations-tp24022p24059.html
>>
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message