spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vlad Frolov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1394) calling system.platform on worker raises IOError
Date Wed, 30 Apr 2014 00:32:14 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985007#comment-13985007
] 

Vlad Frolov commented on SPARK-1394:
------------------------------------

[~idanzalz] unfortunately, it had helped to avoid only one exception, so I commented signal
binding in PySpark and these crashes went away. I hope it will be fixed somehow in next Spark
release.

> calling system.platform on worker raises IOError
> ------------------------------------------------
>
>                 Key: SPARK-1394
>                 URL: https://issues.apache.org/jira/browse/SPARK-1394
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 0.9.0
>         Environment: Tested on Ubuntu and Linux, local and remote master, python 2.7.*
>            Reporter: Idan Zalzberg
>              Labels: pyspark
>
> A simple program that calls system.platform() on the worker fails most of the time (it
works some times but very rarely).
> This is critical since many libraries call that method (e.g. boto).
> Here is the trace of the attempt to call that method:
> $ /usr/local/spark/bin/pyspark
> Python 2.7.3 (default, Feb 27 2014, 20:00:17)
> [GCC 4.6.3] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> 14/04/02 18:18:37 INFO Utils: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
> 14/04/02 18:18:37 WARN Utils: Your hostname, qlika-dev resolves to a loopback address:
127.0.1.1; using 10.33.102.46 instead (on interface eth1)
> 14/04/02 18:18:37 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
> 14/04/02 18:18:38 INFO Slf4jLogger: Slf4jLogger started
> 14/04/02 18:18:38 INFO Remoting: Starting remoting
> 14/04/02 18:18:39 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@10.33.102.46:36640]
> 14/04/02 18:18:39 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@10.33.102.46:36640]
> 14/04/02 18:18:39 INFO SparkEnv: Registering BlockManagerMaster
> 14/04/02 18:18:39 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20140402181839-919f
> 14/04/02 18:18:39 INFO MemoryStore: MemoryStore started with capacity 294.6 MB.
> 14/04/02 18:18:39 INFO ConnectionManager: Bound socket to port 43357 with id = ConnectionManagerId(10.33.102.46,43357)
> 14/04/02 18:18:39 INFO BlockManagerMaster: Trying to register BlockManager
> 14/04/02 18:18:39 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager
10.33.102.46:43357 with 294.6 MB RAM
> 14/04/02 18:18:39 INFO BlockManagerMaster: Registered BlockManager
> 14/04/02 18:18:39 INFO HttpServer: Starting HTTP Server
> 14/04/02 18:18:39 INFO HttpBroadcast: Broadcast server started at http://10.33.102.46:51803
> 14/04/02 18:18:39 INFO SparkEnv: Registering MapOutputTracker
> 14/04/02 18:18:39 INFO HttpFileServer: HTTP File server directory is /tmp/spark-9b38acb0-7b01-4463-b0a6-602bfed05a2b
> 14/04/02 18:18:39 INFO HttpServer: Starting HTTP Server
> 14/04/02 18:18:40 INFO SparkUI: Started Spark Web UI at http://10.33.102.46:4040
> 14/04/02 18:18:40 WARN NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /__ / .__/\_,_/_/ /_/\_\   version 0.9.0
>       /_/
> Using Python version 2.7.3 (default, Feb 27 2014 20:00:17)
> Spark context available as sc.
> >>> import platform
> >>> sc.parallelize([1]).map(lambda x : platform.system()).collect()
> 14/04/02 18:19:17 INFO SparkContext: Starting job: collect at <stdin>:1
> 14/04/02 18:19:17 INFO DAGScheduler: Got job 0 (collect at <stdin>:1) with 1 output
partitions (allowLocal=false)
> 14/04/02 18:19:17 INFO DAGScheduler: Final stage: Stage 0 (collect at <stdin>:1)
> 14/04/02 18:19:17 INFO DAGScheduler: Parents of final stage: List()
> 14/04/02 18:19:17 INFO DAGScheduler: Missing parents: List()
> 14/04/02 18:19:17 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[1] at collect at <stdin>:1),
which has no missing parents
> 14/04/02 18:19:17 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (PythonRDD[1]
at collect at <stdin>:1)
> 14/04/02 18:19:17 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
> 14/04/02 18:19:17 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor localhost:
localhost (PROCESS_LOCAL)
> 14/04/02 18:19:17 INFO TaskSetManager: Serialized task 0.0:0 as 2152 bytes in 12 ms
> 14/04/02 18:19:17 INFO Executor: Running task ID 0
> PySpark worker failed with exception:
> Traceback (most recent call last):
>   File "/usr/local/spark/python/pyspark/worker.py", line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File "/usr/local/spark/python/pyspark/serializers.py", line 182, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File "/usr/local/spark/python/pyspark/serializers.py", line 117, in dump_stream
>     for obj in iterator:
>   File "/usr/local/spark/python/pyspark/serializers.py", line 171, in _batched
>     for item in iterator:
>   File "<stdin>", line 1, in <lambda>
>   File "/usr/lib/python2.7/platform.py", line 1306, in system
>     return uname()[0]
>   File "/usr/lib/python2.7/platform.py", line 1273, in uname
>     processor = _syscmd_uname('-p','')
>   File "/usr/lib/python2.7/platform.py", line 1030, in _syscmd_uname
>     rc = f.close()
> IOError: [Errno 10] No child processes
> 14/04/02 18:19:17 ERROR Executor: Exception in task ID 0
> org.apache.spark.api.python.PythonException: Traceback (most recent call last):
>   File "/usr/local/spark/python/pyspark/worker.py", line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File "/usr/local/spark/python/pyspark/serializers.py", line 182, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File "/usr/local/spark/python/pyspark/serializers.py", line 117, in dump_stream
>     for obj in iterator:
>   File "/usr/local/spark/python/pyspark/serializers.py", line 171, in _batched
>     for item in iterator:
>   File "<stdin>", line 1, in <lambda>
>   File "/usr/lib/python2.7/platform.py", line 1306, in system
>     return uname()[0]
>   File "/usr/lib/python2.7/platform.py", line 1273, in uname
>     processor = _syscmd_uname('-p','')
>   File "/usr/lib/python2.7/platform.py", line 1030, in _syscmd_uname
>     rc = f.close()
> IOError: [Errno 10] No child processes
>         at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:131)
>         at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:153)
>         at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:96)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
>         at org.apache.spark.scheduler.Task.run(Task.scala:53)
>         at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
>         at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message