spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <hol...@pigscanfly.ca>
Subject Re: bug with PYTHONHASHSEED
Date Tue, 04 Apr 2017 16:03:15 GMT
Which version of Spark is this (or is it a dev build)? We've recently made
some improvements with PYTHONHASHSEED propagation.

On Tue, Apr 4, 2017 at 7:49 AM Eike von Seggern <eike.seggern@seven cal.com>
wrote:

2017-04-01 21:54 GMT+02:00 Paul Tremblay <paulhtremblay@gmail.com>:

When I try to to do a groupByKey() in my spark environment, I get the error
described here:

http://stackoverflow.com/questions/36798833/what-does-exception-randomness-of-hash-of-string-should-be-disabled-via-pythonh

In order to attempt to fix the problem, I set up my ipython environment
with the additional line:

PYTHONHASHSEED=1

When I fire up my ipython shell, and do:

In [7]: hash("foo")
Out[7]: -2457967226571033580

In [8]: hash("foo")
Out[8]: -2457967226571033580

So my hash function is now seeded so it returns consistent values. But when
I do a groupByKey(), I get the same error:


Exception: Randomness of hash of string should be disabled via
PYTHONHASHSEED

Anyone know how to fix this problem in python 3.4?


Independent of the python version, you have to ensure that Python on
spark-master and -workers is started with PYTHONHASHSEED set, e.g. by
adding it to the environment of the spark processes.

Best

Eike

-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Mime
View raw message