spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: bug with PYTHONHASHSEED
Date Tue, 04 Apr 2017 23:42:00 GMT
It is fixed in https://issues.apache.org/jira/browse/SPARK-13330



Holden Karau <holden@pigscanfly.ca>于2017年4月5日周三 上午12:03写道:

> Which version of Spark is this (or is it a dev build)? We've recently made
> some improvements with PYTHONHASHSEED propagation.
>
> On Tue, Apr 4, 2017 at 7:49 AM Eike von Seggern <eike.seggern@seven
> cal.com> wrote:
>
> 2017-04-01 21:54 GMT+02:00 Paul Tremblay <paulhtremblay@gmail.com>:
>
> When I try to to do a groupByKey() in my spark environment, I get the
> error described here:
>
>
> http://stackoverflow.com/questions/36798833/what-does-exception-randomness-of-hash-of-string-should-be-disabled-via-pythonh
>
> In order to attempt to fix the problem, I set up my ipython environment
> with the additional line:
>
> PYTHONHASHSEED=1
>
> When I fire up my ipython shell, and do:
>
> In [7]: hash("foo")
> Out[7]: -2457967226571033580
>
> In [8]: hash("foo")
> Out[8]: -2457967226571033580
>
> So my hash function is now seeded so it returns consistent values. But
> when I do a groupByKey(), I get the same error:
>
>
> Exception: Randomness of hash of string should be disabled via
> PYTHONHASHSEED
>
> Anyone know how to fix this problem in python 3.4?
>
>
> Independent of the python version, you have to ensure that Python on
> spark-master and -workers is started with PYTHONHASHSEED set, e.g. by
> adding it to the environment of the spark processes.
>
> Best
>
> Eike
>
> --
> Cell : 425-233-8271 <(425)%20233-8271>
> Twitter: https://twitter.com/holdenkarau
>

Mime
View raw message