spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <>
Subject Re: bug with PYTHONHASHSEED
Date Tue, 04 Apr 2017 16:03:15 GMT
Which version of Spark is this (or is it a dev build)? We've recently made
some improvements with PYTHONHASHSEED propagation.

On Tue, Apr 4, 2017 at 7:49 AM Eike von Seggern <eike.seggern@seven>

2017-04-01 21:54 GMT+02:00 Paul Tremblay <>:

When I try to to do a groupByKey() in my spark environment, I get the error
described here:

In order to attempt to fix the problem, I set up my ipython environment
with the additional line:


When I fire up my ipython shell, and do:

In [7]: hash("foo")
Out[7]: -2457967226571033580

In [8]: hash("foo")
Out[8]: -2457967226571033580

So my hash function is now seeded so it returns consistent values. But when
I do a groupByKey(), I get the same error:

Exception: Randomness of hash of string should be disabled via

Anyone know how to fix this problem in python 3.4?

Independent of the python version, you have to ensure that Python on
spark-master and -workers is started with PYTHONHASHSEED set, e.g. by
adding it to the environment of the spark processes.



Cell : 425-233-8271

View raw message