spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Tremblay <paulhtremb...@gmail.com>
Subject Re: bug with PYTHONHASHSEED
Date Tue, 04 Apr 2017 16:02:47 GMT
So that means I have to pass that bash variable to the EMR clusters when I
spin them up, not afterwards. I'll give that a go.

Thanks!

Henry

On Tue, Apr 4, 2017 at 7:49 AM, Eike von Seggern <eike.seggern@sevenval.com>
wrote:

> 2017-04-01 21:54 GMT+02:00 Paul Tremblay <paulhtremblay@gmail.com>:
>
>> When I try to to do a groupByKey() in my spark environment, I get the
>> error described here:
>>
>> http://stackoverflow.com/questions/36798833/what-does-except
>> ion-randomness-of-hash-of-string-should-be-disabled-via-pythonh
>>
>> In order to attempt to fix the problem, I set up my ipython environment
>> with the additional line:
>>
>> PYTHONHASHSEED=1
>>
>> When I fire up my ipython shell, and do:
>>
>> In [7]: hash("foo")
>> Out[7]: -2457967226571033580
>>
>> In [8]: hash("foo")
>> Out[8]: -2457967226571033580
>>
>> So my hash function is now seeded so it returns consistent values. But
>> when I do a groupByKey(), I get the same error:
>>
>>
>> Exception: Randomness of hash of string should be disabled via
>> PYTHONHASHSEED
>>
>> Anyone know how to fix this problem in python 3.4?
>>
>
> Independent of the python version, you have to ensure that Python on
> spark-master and -workers is started with PYTHONHASHSEED set, e.g. by
> adding it to the environment of the spark processes.
>
> Best
>
> Eike
>



-- 
Paul Henry Tremblay
Robert Half Technology

Mime
View raw message