spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "subscriptions@prismalytics.io" <subscripti...@prismalytics.io>
Subject ImportError: No module named iter ... (on CDH5 v1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch) ...
Date Wed, 04 Mar 2015 00:21:34 GMT
Hi Friends:

We noticed the following in 'pyspark' happens when running in 
distributed Standalone Mode (MASTER=spark://vps00:7077),
but not in Local Mode (MASTER=local[n]).

See the following, particularly what is highlighted in *Red* (again the 
problem only happens in Standalone Mode).
Any ideas? Thank you in advance! =:)

 >>>
 >>> rdd = sc.textFile('file:///etc/hosts')
 >>> rdd.first()

Traceback (most recent call last):
   File "<input>", line 1, in <module>
   File "/usr/lib/spark/python/pyspark/rdd.py", line 1129, in first
     rs = self.take(1)
   File "/usr/lib/spark/python/pyspark/rdd.py", line 1111, in take
     res = self.context.runJob(self, takeUpToNumLeft, p, True)
   File "/usr/lib/spark/python/pyspark/context.py", line 818, in runJob
     it = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
javaPartitions, allowLocal)
   File 
"/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", 
line 538, in __call__
     self.target_id, self.name)
   File 
"/usr/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 
300, in get_return_value
     format(target_id, '.', name), value)
Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 
in stage 1.0
(TID 7, vps03): org.apache.spark.api.python.PythonException: Traceback 
(most recent call last):
   File "/usr/lib/spark/python/pyspark/worker.py", line 107, in main
     process()
   File "/usr/lib/spark/python/pyspark/worker.py", line 98, in process
     serializer.dump_stream(func(split_index, iterator), outfile)
   File "/usr/lib/spark/python/pyspark/serializers.py", line 227, in 
dump_stream
     vs = list(itertools.islice(iterator, batch))
   File *"/usr/lib/spark/python/pyspark/rdd.py", line 1106*, in 
takeUpToNumLeft   <--- *See around line _1106_ of this file in the CDH5 
Spark Distribution*.
     while taken < left:
*ImportError: No module named iter*

 >>> # But *iter()* exists as a built-in (not as a module) ...
 >>> iter(range(10))
<listiterator object at 0x423ff10>
 >>>

cluster$ rpm -qa | grep -i spark
[ ... ]
spark-python-1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch
spark-core-1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch
spark-worker-1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch
spark-master-1.2.0+cdh5.3.2+369-1.cdh5.3.2.p0.17.el6.noarch


Thank you!
Team Prismalytics

Mime
View raw message