spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brad Miller <bmill...@eecs.berkeley.edu>
Subject pyspark bug with unittest and scikit-learn
Date Thu, 19 Jun 2014 20:58:53 GMT
Hi All,

I am attempting to develop some unit tests for a program using pyspark and
scikit-learn and I've come across some weird behavior.  I receive the
following warning during some tests "python/pyspark/serializers.py:327:
DeprecationWarning: integer argument expected, got float".

Although it's only a warning, and my test still passes (i.e. Spark still
seems to work), it would be nice to know why it's happening and if it
actually indicates a problem since this can probably happen outside unit
testing as well.

Note that the warning occurs when I invoke the test as
"SPARK_HOME=/home/spark/spark-1.0.0-bin-hadoop1
PYTHONPATH=/home/spark/spark-1.0.0-bin-hadoop1/python python -m unittest -v
-b crash_test".  Doing any one of the following three things causes the
warning to go away:

-invoking as "python crash_test.py" rather than "python -m unittest -v -b
crash_test"
-commenting out "import sklearn.metrics"
-changing "lambda x: foo(x)" to "lambda x: x"

Note that I am running the following software:
Spark 1.0.0
Python 2.7.3
scikit-learn 0.14.1
Ubuntu 12.04

*Exact Warning (actually occurs 3 times):*
/home/spark/spark-1.0.0-bin-hadoop1/python/pyspark/serializers.py:327:
DeprecationWarning: integer argument expected, got float
  stream.write(struct.pack("!q", value))
/home/spark/spark-1.0.0-bin-hadoop1/python/pyspark/serializers.py:327:
DeprecationWarning: integer argument expected, got float
  stream.write(struct.pack("!q", value))
/home/spark/spark-1.0.0-bin-hadoop1/python/pyspark/serializers.py:327:
DeprecationWarning: integer argument expected, got float
  stream.write(struct.pack("!q", value))

*crash_test.py:*
import unittest
from pyspark import SparkContext
import sklearn.metrics

def foo(x):
    return x

def setUpModule():
    global sc
    sc = SparkContext('local')
    print sc.parallelize(range(4)).map(lambda x: foo(x)).collect()

class CrashTest(unittest.TestCase):
    def test(self):
        pass

if __name__ == '__main__':
    unittest.main()

I'm glad to know if anybody else has experienced a similar problem, or has
insight into what may be happening or if it is significant.

best,
-Brad

Mime
View raw message