spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Ruchovets <oruchov...@gmail.com>
Subject Re: pyspark yarn got exception
Date Wed, 03 Sep 2014 06:42:28 GMT
Hi I changed master to yarn but execution failed with exception again. I am
using PySpark.

[root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
./bin/spark-submit --master yarn  --num-executors 3  --driver-memory 4g
--executor-memory 2g --executor-cores 1   examples/src/main/python/pi.py
1000
/usr/jdk64/jdk1.7.0_45/bin/java
::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar:/etc/hadoop/conf
-XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
14/09/03 14:35:11 INFO spark.SecurityManager: Changing view acls to: root
14/09/03 14:35:11 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view permissions:
Set(root)
14/09/03 14:35:11 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/09/03 14:35:11 INFO Remoting: Starting remoting
14/09/03 14:35:12 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://spark@HDOP-B.AGT:51707]
14/09/03 14:35:12 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://spark@HDOP-B.AGT:51707]
14/09/03 14:35:12 INFO spark.SparkEnv: Registering MapOutputTracker
14/09/03 14:35:12 INFO spark.SparkEnv: Registering BlockManagerMaster
14/09/03 14:35:12 INFO storage.DiskBlockManager: Created local directory at
/tmp/spark-local-20140903143512-5aab
14/09/03 14:35:12 INFO storage.MemoryStore: MemoryStore started with
capacity 2.3 GB.
14/09/03 14:35:12 INFO network.ConnectionManager: Bound socket to port
53216 with id = ConnectionManagerId(HDOP-B.AGT,53216)
14/09/03 14:35:12 INFO storage.BlockManagerMaster: Trying to register
BlockManager
14/09/03 14:35:12 INFO storage.BlockManagerInfo: Registering block manager
HDOP-B.AGT:53216 with 2.3 GB RAM
14/09/03 14:35:12 INFO storage.BlockManagerMaster: Registered BlockManager
14/09/03 14:35:12 INFO spark.HttpServer: Starting HTTP Server
14/09/03 14:35:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/09/03 14:35:12 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:50624
14/09/03 14:35:12 INFO broadcast.HttpBroadcast: Broadcast server started at
http://10.193.1.76:50624
14/09/03 14:35:12 INFO spark.HttpFileServer: HTTP File server directory is
/tmp/spark-fd7fdcb2-f45d-430f-95fa-afbc4f329b43
14/09/03 14:35:12 INFO spark.HttpServer: Starting HTTP Server
14/09/03 14:35:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/09/03 14:35:12 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:41773
14/09/03 14:35:13 INFO server.Server: jetty-8.y.z-SNAPSHOT
14/09/03 14:35:13 INFO server.AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4040
14/09/03 14:35:13 INFO ui.SparkUI: Started SparkUI at http://HDOP-B.AGT:4040
14/09/03 14:35:13 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
--args is deprecated. Use --arg instead.
14/09/03 14:35:14 INFO client.RMProxy: Connecting to ResourceManager at
HDOP-N1.AGT/10.193.1.72:8050
14/09/03 14:35:14 INFO yarn.Client: Got Cluster metric info from
ApplicationsManager (ASM), number of NodeManagers: 6
14/09/03 14:35:14 INFO yarn.Client: Queue info ... queueName: default,
queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
      queueApplicationCount = 0, queueChildQueueCount = 0
14/09/03 14:35:14 INFO yarn.Client: Max mem capabililty of a single
resource in this cluster 13824
14/09/03 14:35:14 INFO yarn.Client: Preparing Local resources
14/09/03 14:35:14 INFO yarn.Client: Uploading
file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
to
hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0036/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
14/09/03 14:35:16 INFO yarn.Client: Uploading
file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
to
hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0036/pi.py
14/09/03 14:35:16 INFO yarn.Client: Setting up the launch environment
14/09/03 14:35:16 INFO yarn.Client: Setting up container launch context
14/09/03 14:35:16 INFO yarn.Client: Command for starting the Spark
ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
-Djava.io.tmpdir=$PWD/tmp,
-Dspark.tachyonStore.folderName=\"spark-98b7d323-2faf-419a-a88d-1a0c549dc5d4\",
-Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
-Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
-Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
-Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\",
-Dspark.fileserver.uri=\"http://10.193.1.76:41773\",
-Dspark.master=\"yarn-client\", -Dspark.driver.port=\"51707\",
-Dspark.executor.cores=\"1\", -Dspark.httpBroadcast.uri=\"
http://10.193.1.76:50624\",
 -Dlog4j.configuration=log4j-spark-container.properties,
org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused, --jar ,
null,  --args  'HDOP-B.AGT:51707' , --executor-memory, 2048,
--executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
<LOG_DIR>/stderr)
14/09/03 14:35:16 INFO yarn.Client: Submitting application to ASM
14/09/03 14:35:16 INFO impl.YarnClientImpl: Submitted application
application_1409559972905_0036
14/09/03 14:35:16 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:
 appMasterRpcPort: -1
 appStartTime: 1409726116517
 yarnAppState: ACCEPTED

14/09/03 14:35:17 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:
 appMasterRpcPort: -1
 appStartTime: 1409726116517
 yarnAppState: ACCEPTED

14/09/03 14:35:18 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:
 appMasterRpcPort: -1
 appStartTime: 1409726116517
 yarnAppState: ACCEPTED

14/09/03 14:35:19 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:
 appMasterRpcPort: -1
 appStartTime: 1409726116517
 yarnAppState: ACCEPTED

14/09/03 14:35:20 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:
 appMasterRpcPort: -1
 appStartTime: 1409726116517
 yarnAppState: ACCEPTED

14/09/03 14:35:21 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:
 appMasterRpcPort: -1
 appStartTime: 1409726116517
 yarnAppState: ACCEPTED

14/09/03 14:35:22 INFO cluster.YarnClientSchedulerBackend: Application
report from ASM:
 appMasterRpcPort: 0
 appStartTime: 1409726116517
 yarnAppState: RUNNING

14/09/03 14:35:24 INFO cluster.YarnClientClusterScheduler:
YarnClientClusterScheduler.postStartHook done
14/09/03 14:35:25 INFO cluster.YarnClientSchedulerBackend: Registered
executor: Actor[akka.tcp://sparkExecutor@HDOP-B.AGT:58976/user/Executor#-1831707618]
with ID 1
14/09/03 14:35:26 INFO storage.BlockManagerInfo: Registering block manager
HDOP-B.AGT:44142 with 1178.1 MB RAM
14/09/03 14:35:26 INFO cluster.YarnClientSchedulerBackend: Registered
executor: Actor[akka.tcp://sparkExecutor@HDOP-N1.AGT:45140/user/Executor#875812337]
with ID 2
14/09/03 14:35:26 INFO storage.BlockManagerInfo: Registering block manager
HDOP-N1.AGT:48513 with 1178.1 MB RAM
14/09/03 14:35:26 INFO cluster.YarnClientSchedulerBackend: Registered
executor: Actor[akka.tcp://sparkExecutor@HDOP-N3.AGT:45380/user/Executor#1559437246]
with ID 3
14/09/03 14:35:27 INFO storage.BlockManagerInfo: Registering block manager
HDOP-N3.AGT:46616 with 1178.1 MB RAM
14/09/03 14:35:56 INFO spark.SparkContext: Starting job: reduce at
/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
14/09/03 14:35:56 INFO scheduler.DAGScheduler: Got job 0 (reduce at
/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
with 1000 output partitions (allowLocal=false)
14/09/03 14:35:56 INFO scheduler.DAGScheduler: Final stage: Stage 0(reduce
at
/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
14/09/03 14:35:56 INFO scheduler.DAGScheduler: Parents of final stage:
List()
14/09/03 14:35:56 INFO scheduler.DAGScheduler: Missing parents: List()
14/09/03 14:35:56 INFO scheduler.DAGScheduler: Submitting Stage 0
(PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing parents
14/09/03 14:35:56 INFO scheduler.DAGScheduler: Submitting 1000 missing
tasks from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
14/09/03 14:35:56 INFO cluster.YarnClientClusterScheduler: Adding task set
0.0 with 1000 tasks
14/09/03 14:35:56 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID
0 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
14/09/03 14:35:56 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
369811 bytes in 9 ms
14/09/03 14:35:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID
1 on executor 3: HDOP-N3.AGT (PROCESS_LOCAL)
14/09/03 14:35:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
506276 bytes in 5 ms
14/09/03 14:35:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as TID
2 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
501136 bytes in 5 ms
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:3 as TID
3 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
506276 bytes in 4 ms
14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 2 (task 0.0:2)
14/09/03 14:35:57 WARN scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode

at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:2 as TID
4 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
501136 bytes in 4 ms
14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0)
14/09/03 14:35:57 WARN scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode

at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
at org.apache.spark.scheduler.Task.run(Task.scala:51)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID
5 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
369811 bytes in 3 ms
14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 3 (task 0.0:3)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 1]
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:3 as TID
6 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
506276 bytes in 4 ms
14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 4 (task 0.0:2)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 1]
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:2 as TID
7 on executor 3: HDOP-N3.AGT (PROCESS_LOCAL)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
501136 bytes in 4 ms
14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 1 (task 0.0:1)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 2]
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID
8 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
506276 bytes in 4 ms
14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 5 (task 0.0:0)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 3]
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID
9 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
369811 bytes in 4 ms
14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 6 (task 0.0:3)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 2]
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:3 as TID
10 on executor 3: HDOP-N3.AGT (PROCESS_LOCAL)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
506276 bytes in 4 ms
14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 7 (task 0.0:2)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 4]
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:2 as TID
11 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
501136 bytes in 3 ms
14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 9 (task 0.0:0)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 3]
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID
12 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
369811 bytes in 4 ms
14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 8 (task 0.0:1)
14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 5]
14/09/03 14:35:58 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID
13 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
14/09/03 14:35:58 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
506276 bytes in 3 ms
14/09/03 14:35:58 WARN scheduler.TaskSetManager: Lost TID 11 (task 0.0:2)
14/09/03 14:35:58 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 4]
14/09/03 14:35:58 ERROR scheduler.TaskSetManager: Task 0.0:2 failed 4
times; aborting job
14/09/03 14:35:58 INFO cluster.YarnClientClusterScheduler: Cancelling stage
0
14/09/03 14:35:58 INFO cluster.YarnClientClusterScheduler: Stage 0 was
cancelled
14/09/03 14:35:58 INFO scheduler.TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 6]
14/09/03 14:35:58 INFO scheduler.DAGScheduler: Failed to run reduce at
/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
Traceback (most recent call last):
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 38, in <module>
    count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 619, in reduce
    vals = self.mapPartitions(func).collect()
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 583, in collect
    bytesInJava = self._jrdd.collect().iterator()
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
line 537, in __call__
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError14/09/03 14:35:58 INFO scheduler.TaskSetManager:
Loss was due to org.apache.spark.api.python.PythonException: Traceback
(most recent call last):
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode
 [duplicate 7]
: An error occurred while calling o24.collect.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task
0.0:2 failed 4 times, most recent failure: Exception failure in TID 11 on
host HDOP-N1.AGT: org.apache.spark.api.python.PythonException: Traceback
(most recent call last):
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
line 77, in main
    serializer.dump_stream(func(split_index, iterator), outfile)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 191, in dump_stream
    self.serializer.dump_stream(self._batched(iterator), stream)
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 123, in dump_stream
    for obj in iterator:
  File
"/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
line 180, in _batched
    for item in iterator:
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
line 612, in func
  File
"/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
line 36, in f
SystemError: unknown opcode


org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)

org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
        org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
        org.apache.spark.scheduler.Task.run(Task.scala:51)

org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:744)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

14/09/03 14:35:58 WARN scheduler.TaskSetManager: Loss was due to
org.apache.spark.TaskKilledException
org.apache.spark.TaskKilledException
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
14/09/03 14:35:58 INFO cluster.YarnClientClusterScheduler: Removed TaskSet
0.0, whose tasks have all completed, from pool




On Wed, Sep 3, 2014 at 1:53 PM, Oleg Ruchovets <oruchovets@gmail.com> wrote:

> Hello Sandy , I changed to using yarn master but still got the exceptions:
>
> What is the procedure to execute pyspark on yarn? is it required only to
> attached the command , or it is required to start spark processes also?
>
>
>
>
> [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
> ./bin/spark-submit --master yarn://HDOP-N1.AGT:8032 --num-executors 3
>  --driver-memory 4g --executor-memory 2g --executor-cores 1
> examples/src/main/python/pi.py   1000
> /usr/jdk64/jdk1.7.0_45/bin/java
>
> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar:/etc/hadoop/conf
> -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
> 14/09/03 13:48:48 INFO spark.SecurityManager: Changing view acls to: root
> 14/09/03 13:48:48 INFO spark.SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users with view permissions:
> Set(root)
> 14/09/03 13:48:49 INFO slf4j.Slf4jLogger: Slf4jLogger started
> 14/09/03 13:48:49 INFO Remoting: Starting remoting
> 14/09/03 13:48:49 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://spark@HDOP-B.AGT:34424]
> 14/09/03 13:48:49 INFO Remoting: Remoting now listens on addresses:
> [akka.tcp://spark@HDOP-B.AGT:34424]
> 14/09/03 13:48:49 INFO spark.SparkEnv: Registering MapOutputTracker
> 14/09/03 13:48:49 INFO spark.SparkEnv: Registering BlockManagerMaster
> 14/09/03 13:48:49 INFO storage.DiskBlockManager: Created local directory
> at /tmp/spark-local-20140903134849-231c
> 14/09/03 13:48:49 INFO storage.MemoryStore: MemoryStore started with
> capacity 2.3 GB.
> 14/09/03 13:48:49 INFO network.ConnectionManager: Bound socket to port
> 60647 with id = ConnectionManagerId(HDOP-B.AGT,60647)
> 14/09/03 13:48:49 INFO storage.BlockManagerMaster: Trying to register
> BlockManager
> 14/09/03 13:48:49 INFO storage.BlockManagerInfo: Registering block manager
> HDOP-B.AGT:60647 with 2.3 GB RAM
> 14/09/03 13:48:49 INFO storage.BlockManagerMaster: Registered BlockManager
> 14/09/03 13:48:49 INFO spark.HttpServer: Starting HTTP Server
> 14/09/03 13:48:49 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 14/09/03 13:48:49 INFO server.AbstractConnector: Started
> SocketConnector@0.0.0.0:56549
> 14/09/03 13:48:49 INFO broadcast.HttpBroadcast: Broadcast server started
> at http://10.193.1.76:56549
> 14/09/03 13:48:49 INFO spark.HttpFileServer: HTTP File server directory is
> /tmp/spark-90af1222-9ea8-4dd8-887a-343d09d44333
> 14/09/03 13:48:49 INFO spark.HttpServer: Starting HTTP Server
> 14/09/03 13:48:49 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 14/09/03 13:48:49 INFO server.AbstractConnector: Started
> SocketConnector@0.0.0.0:36512
> 14/09/03 13:48:50 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 14/09/03 13:48:50 INFO server.AbstractConnector: Started
> SelectChannelConnector@0.0.0.0:4040
> 14/09/03 13:48:50 INFO ui.SparkUI: Started SparkUI at
> http://HDOP-B.AGT:4040
> 14/09/03 13:48:50 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> --args is deprecated. Use --arg instead.
> 14/09/03 13:48:51 INFO client.RMProxy: Connecting to ResourceManager at
> HDOP-N1.AGT/10.193.1.72:8050
> 14/09/03 13:48:51 INFO yarn.Client: Got Cluster metric info from
> ApplicationsManager (ASM), number of NodeManagers: 6
> 14/09/03 13:48:51 INFO yarn.Client: Queue info ... queueName: default,
> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>       queueApplicationCount = 0, queueChildQueueCount = 0
> 14/09/03 13:48:51 INFO yarn.Client: Max mem capabililty of a single
> resource in this cluster 13824
> 14/09/03 13:48:51 INFO yarn.Client: Preparing Local resources
> 14/09/03 13:48:51 INFO yarn.Client: Uploading
> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
> to
> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0033/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
> 14/09/03 13:48:53 INFO yarn.Client: Uploading
> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
> to
> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0033/pi.py
> 14/09/03 13:48:53 INFO yarn.Client: Setting up the launch environment
> 14/09/03 13:48:53 INFO yarn.Client: Setting up container launch context
> 14/09/03 13:48:53 INFO yarn.Client: Command for starting the Spark
> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
> -Djava.io.tmpdir=$PWD/tmp,
> -Dspark.tachyonStore.folderName=\"spark-bdabb882-a2e0-46b6-8e87-90cc6e359d84\",
> -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
> -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
> -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
> -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\",
> -Dspark.fileserver.uri=\"http://10.193.1.76:36512\",
> -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"34424\",
> -Dspark.executor.cores=\"1\", -Dspark.httpBroadcast.uri=\"
> http://10.193.1.76:56549\",
>  -Dlog4j.configuration=log4j-spark-container.properties,
> org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused, --jar ,
> null,  --args  'HDOP-B.AGT:34424' , --executor-memory, 2048,
> --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
> <LOG_DIR>/stderr)
> 14/09/03 13:48:53 INFO yarn.Client: Submitting application to ASM
> 14/09/03 13:48:53 INFO impl.YarnClientImpl: Submitted application
> application_1409559972905_0033
> 14/09/03 13:48:53 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
>  appMasterRpcPort: -1
>  appStartTime: 1409723333584
>  yarnAppState: ACCEPTED
>
> 14/09/03 13:48:54 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
>  appMasterRpcPort: -1
>  appStartTime: 1409723333584
>  yarnAppState: ACCEPTED
>
> 14/09/03 13:48:55 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
>  appMasterRpcPort: -1
>  appStartTime: 1409723333584
>  yarnAppState: ACCEPTED
>
> 14/09/03 13:48:56 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
>  appMasterRpcPort: -1
>  appStartTime: 1409723333584
>  yarnAppState: ACCEPTED
>
> 14/09/03 13:48:57 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
>  appMasterRpcPort: -1
>  appStartTime: 1409723333584
>  yarnAppState: ACCEPTED
>
> 14/09/03 13:48:58 INFO cluster.YarnClientSchedulerBackend: Application
> report from ASM:
>  appMasterRpcPort: 0
>  appStartTime: 1409723333584
>  yarnAppState: RUNNING
>
> 14/09/03 13:49:00 INFO cluster.YarnClientClusterScheduler:
> YarnClientClusterScheduler.postStartHook done
> 14/09/03 13:49:01 INFO cluster.YarnClientSchedulerBackend: Registered
> executor: Actor[akka.tcp://sparkExecutor@HDOP-B.AGT:57078/user/Executor#1595833626]
> with ID 1
> 14/09/03 13:49:02 INFO storage.BlockManagerInfo: Registering block manager
> HDOP-B.AGT:54579 with 1178.1 MB RAM
> 14/09/03 13:49:03 INFO cluster.YarnClientSchedulerBackend: Registered
> executor: Actor[akka.tcp://sparkExecutor@HDOP-N4.AGT:43121/user/Executor#-1266627304]
> with ID 2
> 14/09/03 13:49:03 INFO cluster.YarnClientSchedulerBackend: Registered
> executor: Actor[akka.tcp://sparkExecutor@HDOP-N2.AGT:36952/user/Executor#1003961369]
> with ID 3
> 14/09/03 13:49:04 INFO storage.BlockManagerInfo: Registering block manager
> HDOP-N4.AGT:56891 with 1178.1 MB RAM
> 14/09/03 13:49:04 INFO storage.BlockManagerInfo: Registering block manager
> HDOP-N2.AGT:42381 with 1178.1 MB RAM
> 14/09/03 13:49:33 INFO spark.SparkContext: Starting job: reduce at
> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Got job 0 (reduce at
> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
> with 1000 output partitions (allowLocal=false)
> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Final stage: Stage 0(reduce
> at
> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Parents of final stage:
> List()
> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Missing parents: List()
> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Submitting Stage 0
> (PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing parents
> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Submitting 1000 missing
> tasks from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
> 14/09/03 13:49:33 INFO cluster.YarnClientClusterScheduler: Adding task set
> 0.0 with 1000 tasks
> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
> TID 0 on executor 2: HDOP-N4.AGT (PROCESS_LOCAL)
> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
> 369811 bytes in 4 ms
> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
> TID 1 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
> 506276 bytes in 5 ms
> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
> TID 2 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
> 501136 bytes in 5 ms
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
> TID 3 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
> 506276 bytes in 5 ms
> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 2 (task 0.0:2)
> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>
> at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
> at
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>  at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>  at org.apache.spark.scheduler.Task.run(Task.scala:51)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
> TID 4 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
> 501136 bytes in 4 ms
> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 1 (task 0.0:1)
> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>
> at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
> at
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>  at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>  at org.apache.spark.scheduler.Task.run(Task.scala:51)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
> TID 5 on executor 2: HDOP-N4.AGT (PROCESS_LOCAL)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
> 506276 bytes in 4 ms
> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0)
> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>
> at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
> at
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>  at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>  at org.apache.spark.scheduler.Task.run(Task.scala:51)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  at java.lang.Thread.run(Thread.java:744)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
> TID 6 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
> 369811 bytes in 4 ms
> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 3 (task 0.0:3)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 1]
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
> TID 7 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
> 506276 bytes in 4 ms
> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 4 (task 0.0:2)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 1]
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
> TID 8 on executor 2: HDOP-N4.AGT (PROCESS_LOCAL)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
> 501136 bytes in 3 ms
> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 5 (task 0.0:1)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 1]
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
> TID 9 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
> 506276 bytes in 4 ms
> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 6 (task 0.0:0)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 2]
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
> TID 10 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
> 369811 bytes in 3 ms
> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 7 (task 0.0:3)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 2]
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
> TID 11 on executor 2: HDOP-N4.AGT (PROCESS_LOCAL)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
> 506276 bytes in 4 ms
> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 8 (task 0.0:2)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 2]
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
> TID 12 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
> 501136 bytes in 3 ms
> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 9 (task 0.0:1)
> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 3]
> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
> TID 13 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
> 506276 bytes in 4 ms
> 14/09/03 13:49:35 WARN scheduler.TaskSetManager: Lost TID 10 (task 0.0:0)
> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 3]
> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
> TID 14 on executor 2: HDOP-N4.AGT (PROCESS_LOCAL)
> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
> 369811 bytes in 4 ms
> 14/09/03 13:49:35 WARN scheduler.TaskSetManager: Lost TID 11 (task 0.0:3)
> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 3]
> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
> TID 15 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
> 506276 bytes in 3 ms
> 14/09/03 13:49:35 WARN scheduler.TaskSetManager: Lost TID 13 (task 0.0:1)
> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 4]
> 14/09/03 13:49:35 ERROR scheduler.TaskSetManager: Task 0.0:1 failed 4
> times; aborting job
> 14/09/03 13:49:35 INFO cluster.YarnClientClusterScheduler: Cancelling
> stage 0
> 14/09/03 13:49:35 INFO cluster.YarnClientClusterScheduler: Stage 0 was
> cancelled
> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Loss was due to
> org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>  [duplicate 4]
> 14/09/03 13:49:35 INFO scheduler.DAGScheduler: Failed to run reduce at
> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
> Traceback (most recent call last):
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 38, in <module>
>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 619, in reduce
>     vals = self.mapPartitions(func).collect()
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 583, in collect
>     bytesInJava = self._jrdd.collect().iterator()
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
> line 537, in __call__
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o24.collect.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 0.0:1 failed 4 times, most recent failure: Exception failure in TID 13 on
> host HDOP-N2.AGT: org.apache.spark.api.python.PythonException: Traceback
> (most recent call last):
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
> line 77, in main
>     serializer.dump_stream(func(split_index, iterator), outfile)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 191, in dump_stream
>     self.serializer.dump_stream(self._batched(iterator), stream)
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 123, in dump_stream
>     for obj in iterator:
>   File
> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
> line 180, in _batched
>     for item in iterator:
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
> line 612, in func
>   File
> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
> line 36, in f
> SystemError: unknown opcode
>
>
> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>
> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>         org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>         org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         java.lang.Thread.run(Thread.java:744)
> Driver stacktrace:
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>  at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>  at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>  at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>  at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
> at scala.Option.foreach(Option.scala:236)
>  at
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
> at
> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>  at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>  at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>  at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>
> 14/09/03 13:49:35 WARN scheduler.TaskSetManager: Loss was due to
> org.apache.spark.TaskKilledException
> org.apache.spark.TaskKilledException
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
>
> On Wed, Sep 3, 2014 at 1:40 PM, Sandy Ryza <sandy.ryza@cloudera.com>
> wrote:
>
>> Hi Oleg. To run on YARN, simply set master to "yarn".  The YARN
>> configuration, located in a yarn-site.xml, determines where to look for the
>> YARN ResourceManager.
>>
>> PROCESS_LOCAL is orthogonal to the choice of cluster resource manager. A
>> task is considered PROCESS_LOCAL when the executor it's running in happens
>> to have the data it's processing cached.
>>
>> If you're looking to get familiar with the kind of confusing web of
>> terminology, this blog post might be helpful:
>>
>> http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/
>>
>> -Sandy
>>
>>
>> On Tue, Sep 2, 2014 at 9:51 PM, Oleg Ruchovets <oruchovets@gmail.com>
>> wrote:
>>
>>> Hi ,
>>>   I change my command to :
>>>   ./bin/spark-submit --master spark://HDOP-B.AGT:7077 --num-executors 3
>>>  --driver-memory 4g --executor-memory 2g --executor-cores 1
>>> examples/src/main/python/pi.py   1000
>>> and it fixed the problem.
>>>
>>> I still have couple of questions:
>>>    PROCESS_LOCAL is not Yarn execution , right? how should I configure
>>> the running on yarn? Should I exeture start-all script on all machine or
>>> only one?  Where is the UI / LOGS of spark execution?
>>>
>>>
>>>
>>>
>>>
>>>  152 152 SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:14 0.2 s  0 0
>>> SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s 39 ms  2 2
>>> SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s 39 ms  3 3
>>> SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s 39 ms1 ms  4 4
>>> SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.8 s 39 ms 2 ms 5
>>> 5 SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.8 s 39 ms1 ms  6
>>> 6 SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.8 s 1 ms 7 7
>>> SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s  8 8 SUCCESS
>>> PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.3 s  9 9 SUCCESS
>>> PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.4 s  10 10 SUCCESS
>>> PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.3 s 1 ms 11 11 SUCCESS
>>> PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.3 s
>>>
>>>
>>> On Wed, Sep 3, 2014 at 12:19 PM, Oleg Ruchovets <oruchovets@gmail.com>
>>> wrote:
>>>
>>>> Hi Andrew.
>>>>    what should I do to set master on yarn, can you please pointing me
>>>> on command or documentation how to do it?
>>>>
>>>>
>>>> I am doing the following:
>>>>    executed start-all.sh
>>>>    [root@HDOP-B sbin]# ./start-all.sh
>>>> starting org.apache.spark.deploy.master.Master, logging to
>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-HDOP-B.AGT.out
>>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of
>>>> known hosts.
>>>> localhost: starting org.apache.spark.deploy.worker.Worker, logging to
>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-HDOP-B.AGT.out
>>>>
>>>>
>>>> after execute the command:
>>>>     ./bin/spark-submit --master spark://HDOP-B.AGT:7077
>>>> examples/src/main/python/pi.py 1000
>>>>
>>>>
>>>> the result is the following:
>>>>
>>>>    /usr/jdk64/jdk1.7.0_45/bin/java
>>>>
>>>> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>>> -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
>>>> 14/09/03 12:10:06 INFO SecurityManager: Using Spark's default log4j
>>>> profile: org/apache/spark/log4j-defaults.properties
>>>> 14/09/03 12:10:06 INFO SecurityManager: Changing view acls to: root
>>>> 14/09/03 12:10:06 INFO SecurityManager: SecurityManager: authentication
>>>> disabled; ui acls disabled; users with view permissions: Set(root)
>>>> 14/09/03 12:10:07 INFO Slf4jLogger: Slf4jLogger started
>>>> 14/09/03 12:10:07 INFO Remoting: Starting remoting
>>>> 14/09/03 12:10:07 INFO Remoting: Remoting started; listening on
>>>> addresses :[akka.tcp://spark@HDOP-B.AGT:38944]
>>>> 14/09/03 12:10:07 INFO Remoting: Remoting now listens on addresses:
>>>> [akka.tcp://spark@HDOP-B.AGT:38944]
>>>> 14/09/03 12:10:07 INFO SparkEnv: Registering MapOutputTracker
>>>> 14/09/03 12:10:07 INFO SparkEnv: Registering BlockManagerMaster
>>>> 14/09/03 12:10:08 INFO DiskBlockManager: Created local directory at
>>>> /tmp/spark-local-20140903121008-cf09
>>>> 14/09/03 12:10:08 INFO MemoryStore: MemoryStore started with capacity
>>>> 294.9 MB.
>>>> 14/09/03 12:10:08 INFO ConnectionManager: Bound socket to port 45041
>>>> with id = ConnectionManagerId(HDOP-B.AGT,45041)
>>>> 14/09/03 12:10:08 INFO BlockManagerMaster: Trying to register
>>>> BlockManager
>>>> 14/09/03 12:10:08 INFO BlockManagerInfo: Registering block manager
>>>> HDOP-B.AGT:45041 with 294.9 MB RAM
>>>> 14/09/03 12:10:08 INFO BlockManagerMaster: Registered BlockManager
>>>> 14/09/03 12:10:08 INFO HttpServer: Starting HTTP Server
>>>> 14/09/03 12:10:08 INFO HttpBroadcast: Broadcast server started at
>>>> http://10.193.1.76:59336
>>>> 14/09/03 12:10:08 INFO HttpFileServer: HTTP File server directory is
>>>> /tmp/spark-7bf5c3c3-1c02-41e8-9fb0-983e175dd45c
>>>> 14/09/03 12:10:08 INFO HttpServer: Starting HTTP Server
>>>> 14/09/03 12:10:08 INFO SparkUI: Started SparkUI at
>>>> http://HDOP-B.AGT:4040
>>>> 14/09/03 12:10:09 WARN NativeCodeLoader: Unable to load native-hadoop
>>>> library for your platform... using builtin-java classes where applicable
>>>> 14/09/03 12:10:09 INFO Utils: Copying
>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>>> to /tmp/spark-4e252376-70cb-4171-bf2c-d804524e816c/pi.py
>>>> 14/09/03 12:10:09 INFO SparkContext: Added file
>>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>>> at http://10.193.1.76:45893/files/pi.py with timestamp 1409717409277
>>>> 14/09/03 12:10:09 INFO AppClient$ClientActor: Connecting to master
>>>> spark://HDOP-B.AGT:7077...
>>>> 14/09/03 12:10:09 INFO SparkDeploySchedulerBackend: Connected to Spark
>>>> cluster with app ID app-20140903121009-0000
>>>> 14/09/03 12:10:09 INFO AppClient$ClientActor: Executor added:
>>>> app-20140903121009-0000/0 on worker-20140903120712-HDOP-B.AGT-51161
>>>> (HDOP-B.AGT:51161) with 8 cores
>>>> 14/09/03 12:10:09 INFO SparkDeploySchedulerBackend: Granted executor ID
>>>> app-20140903121009-0000/0 on hostPort HDOP-B.AGT:51161 with 8 cores, 512.0
>>>> MB RAM
>>>> 14/09/03 12:10:09 INFO AppClient$ClientActor: Executor updated:
>>>> app-20140903121009-0000/0 is now RUNNING
>>>> 14/09/03 12:10:12 INFO SparkDeploySchedulerBackend: Registered
>>>> executor: Actor[akka.tcp://sparkExecutor@HDOP-B.AGT:38143/user/Executor#1295757828]
>>>> with ID 0
>>>> 14/09/03 12:10:12 INFO BlockManagerInfo: Registering block manager
>>>> HDOP-B.AGT:38670 with 294.9 MB RAM
>>>> Traceback (most recent call last):
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 38, in <module>
>>>>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>>>> line 271, in parallelize
>>>>     jrdd = readRDDFromFile(self._jsc, tempFile.name, numSlices)
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>>>> line 537, in __call__
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>>>> line 300, in get_return_value
>>>> py4j.protocol.Py4JJavaError: An error occurred while calling
>>>> z:org.apache.spark.api.python.PythonRDD.readRDDFromFile.
>>>> : java.lang.OutOfMemoryError: Java heap space
>>>> at
>>>> org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279)
>>>>  at
>>>> org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>  at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>  at java.lang.reflect.Method.invoke(Method.java:606)
>>>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>>>>  at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>>> at py4j.Gateway.invoke(Gateway.java:259)
>>>>  at
>>>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>>>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>>>  at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>>> at java.lang.Thread.run(Thread.java:744)
>>>>
>>>>
>>>>
>>>> What should I do to fix the issue
>>>>
>>>> Thanks
>>>> Oleg.
>>>>
>>>>
>>>> On Tue, Sep 2, 2014 at 10:32 PM, Andrew Or <andrew@databricks.com>
>>>> wrote:
>>>>
>>>>> Hi Oleg,
>>>>>
>>>>> If you are running Spark on a yarn cluster, you should set --master to
>>>>> yarn. By default this runs in client mode, which redirects all output of
>>>>> your application to your console. This is failing because it is trying to
>>>>> connect to a standalone master that you probably did not start. I am
>>>>> somewhat puzzled as to how you ran into an OOM from this configuration,
>>>>> however. Does this problem still occur if you set the correct master?
>>>>>
>>>>> -Andrew
>>>>>
>>>>>
>>>>> 2014-09-02 2:42 GMT-07:00 Oleg Ruchovets <oruchovets@gmail.com>:
>>>>>
>>>>> Hi ,
>>>>>>    I've installed pyspark on hpd hortonworks cluster.
>>>>>>   Executing pi example:
>>>>>>
>>>>>> command:
>>>>>>        spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>>>>>> ./bin/spark-submit --master spark://10.193.1.71:7077
>>>>>> examples/src/main/python/pi.py   1000
>>>>>>
>>>>>> exception:
>>>>>>
>>>>>>     14/09/02 17:34:02 INFO SecurityManager: Using Spark's default
>>>>>> log4j profile: org/apache/spark/log4j-defaults.properties
>>>>>> 14/09/02 17:34:02 INFO SecurityManager: Changing view acls to: root
>>>>>> 14/09/02 17:34:02 INFO SecurityManager: SecurityManager:
>>>>>> authentication disabled; ui acls disabled; users with view permissions:
>>>>>> Set(root)
>>>>>> 14/09/02 17:34:02 INFO Slf4jLogger: Slf4jLogger started
>>>>>> 14/09/02 17:34:02 INFO Remoting: Starting remoting
>>>>>> 14/09/02 17:34:03 INFO Remoting: Remoting started; listening on
>>>>>> addresses :[akka.tcp://spark@HDOP-M.AGT:41059]
>>>>>> 14/09/02 17:34:03 INFO Remoting: Remoting now listens on addresses:
>>>>>> [akka.tcp://spark@HDOP-M.AGT:41059]
>>>>>> 14/09/02 17:34:03 INFO SparkEnv: Registering MapOutputTracker
>>>>>> 14/09/02 17:34:03 INFO SparkEnv: Registering BlockManagerMaster
>>>>>> 14/09/02 17:34:03 INFO DiskBlockManager: Created local directory at
>>>>>> /tmp/spark-local-20140902173403-cda8
>>>>>> 14/09/02 17:34:03 INFO MemoryStore: MemoryStore started with capacity
>>>>>> 294.9 MB.
>>>>>> 14/09/02 17:34:03 INFO ConnectionManager: Bound socket to port 34931
>>>>>> with id = ConnectionManagerId(HDOP-M.AGT,34931)
>>>>>> 14/09/02 17:34:03 INFO BlockManagerMaster: Trying to register
>>>>>> BlockManager
>>>>>> 14/09/02 17:34:03 INFO BlockManagerInfo: Registering block manager
>>>>>> HDOP-M.AGT:34931 with 294.9 MB RAM
>>>>>> 14/09/02 17:34:03 INFO BlockManagerMaster: Registered BlockManager
>>>>>> 14/09/02 17:34:03 INFO HttpServer: Starting HTTP Server
>>>>>> 14/09/02 17:34:03 INFO HttpBroadcast: Broadcast server started at
>>>>>> http://10.193.1.71:54341
>>>>>> 14/09/02 17:34:03 INFO HttpFileServer: HTTP File server directory is
>>>>>> /tmp/spark-77c7a7dc-181e-4069-a014-8103a6a6330a
>>>>>> 14/09/02 17:34:03 INFO HttpServer: Starting HTTP Server
>>>>>> 14/09/02 17:34:04 INFO SparkUI: Started SparkUI at
>>>>>> http://HDOP-M.AGT:4040
>>>>>> 14/09/02 17:34:04 WARN NativeCodeLoader: Unable to load native-hadoop
>>>>>> library for your platform... using builtin-java classes where applicable
>>>>>> 14/09/02 17:34:04 INFO Utils: Copying
>>>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>>>>> to /tmp/spark-f2e0cc0f-59cb-4f6c-9d48-f16205a40c7e/pi.py
>>>>>> 14/09/02 17:34:04 INFO SparkContext: Added file
>>>>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>>>>> at http://10.193.1.71:52938/files/pi.py with timestamp 1409650444941
>>>>>> 14/09/02 17:34:05 INFO AppClient$ClientActor: Connecting to master
>>>>>> spark://10.193.1.71:7077...
>>>>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to
>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to
>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to
>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to
>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>>>> 14/09/02 17:34:25 INFO AppClient$ClientActor: Connecting to master
>>>>>> spark://10.193.1.71:7077...
>>>>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to
>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to
>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to
>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to
>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>>>> Traceback (most recent call last):
>>>>>>   File
>>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>>> line 38, in <module>
>>>>>>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>>>>>>   File
>>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>>>>>> line 271, in parallelize
>>>>>>     jrdd = readRDDFromFile(self._jsc, tempFile.name, numSlices)
>>>>>>   File
>>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>>>>>> line 537, in __call__
>>>>>>   File
>>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>>>>>> line 300, in get_return_value
>>>>>> py4j.protocol.Py4JJavaError: An error occurred while calling
>>>>>> z:org.apache.spark.api.python.PythonRDD.readRDDFromFile.
>>>>>> : java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>>>> at
>>>>>> org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279)
>>>>>> at
>>>>>> org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala)
>>>>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>> at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>  at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>  at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>>>>>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>>>>>  at py4j.Gateway.invoke(Gateway.java:259)
>>>>>> at
>>>>>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>>>>>>  at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>>>>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>>>>>  at java.lang.Thread.run(Thread.java:744)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Question:
>>>>>>     how can I know spark master and port? Where is it defined?
>>>>>>
>>>>>> Thanks
>>>>>> Oleg.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message