spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davies Liu <dav...@databricks.com>
Subject Re: pyspark yarn got exception
Date Thu, 04 Sep 2014 16:52:14 GMT
You can use PYSPARK_PYTHON to choose which version of python will be
used in pyspark, such as:

PYSPARK_PYTHON=/anaconda/bin/python  bin/pyspark

On Thu, Sep 4, 2014 at 1:30 AM, Oleg Ruchovets <oruchovets@gmail.com> wrote:
> Hi ,
>
>     I got what is the reason of the problem.
> HDP Hortonworks uses python 2.6.6 for ambari installations and rest of the
> stuff.
> I can run the PySpark and it works fine , but I need to use Anaconda
> distribution (for spark). When I installed Anaconda (python 2.7.7) i GOT THE
> PROBLEM.
>
> Question: how can this be resolved? Is there an way to have 2 python
> versions installed on one machine?
>
>
> Thanks
> Oleg.
>
>
> On Thu, Sep 4, 2014 at 1:15 PM, Oleg Ruchovets <oruchovets@gmail.com> wrote:
>>
>> Hi Andrew.
>>
>> Problem still occur:
>>
>> all machines are using python 2.7:
>>
>> [root@HDOP-N2 conf]# python --version
>> Python 2.7.7 :: Anaconda 2.0.1 (64-bit)
>>
>> Executing command from bin/pyspark:
>>            [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>> bin/pyspark    --driver-memory 4g --executor-memory 2g --executor-cores 1
>> examples/src/main/python/pi.py   1000
>>
>>
>> Python 2.7.7 |Anaconda 2.0.1 (64-bit)| (default, Jun  2 2014, 12:34:02)
>> [GCC 4.1.2 20080704 (Red Hat 4.1.2-54)] on linux2
>> Type "help", "copyright", "credits" or "license" for more information.
>> Anaconda is brought to you by Continuum Analytics.
>> Please check out: http://continuum.io/thanks and https://binstar.org
>> Traceback (most recent call last):
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/shell.py",
>> line 43, in <module>
>>     sc = SparkContext(appName="PySparkShell", pyFiles=add_files)
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>> line 94, in __init__
>>     SparkContext._ensure_initialized(self, gateway=gateway)
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>> line 190, in _ensure_initialized
>>     SparkContext._gateway = gateway or launch_gateway()
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/java_gateway.py",
>> line 51, in launch_gateway
>>     gateway_port = int(proc.stdout.readline())
>> ValueError: invalid literal for int() with base 10:
>> '/usr/jdk64/jdk1.7.0_45/bin/java\n'
>> >>>
>>
>>
>>
>> This log is from Yarn Spark execution:
>>
>>
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>> [jar:file:/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>> [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> 14/09/04 12:53:19 INFO SecurityManager: Changing view acls to: yarn,root
>> 14/09/04 12:53:19 INFO SecurityManager: SecurityManager: authentication
>> disabled; ui acls disabled; users with view permissions: Set(yarn, root)
>> 14/09/04 12:53:20 INFO Slf4jLogger: Slf4jLogger started
>> 14/09/04 12:53:20 INFO Remoting: Starting remoting
>> 14/09/04 12:53:20 INFO Remoting: Remoting started; listening on addresses
>> :[akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619]
>> 14/09/04 12:53:20 INFO Remoting: Remoting now listens on addresses:
>> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619]
>> 14/09/04 12:53:20 INFO RMProxy: Connecting to ResourceManager at
>> HDOP-N1.AGT/10.193.1.72:8030
>> 14/09/04 12:53:21 INFO ExecutorLauncher: ApplicationAttemptId:
>> appattempt_1409805761292_0005_000001
>> 14/09/04 12:53:21 INFO ExecutorLauncher: Registering the ApplicationMaster
>> 14/09/04 12:53:21 INFO ExecutorLauncher: Waiting for Spark driver to be
>> reachable.
>> 14/09/04 12:53:21 INFO ExecutorLauncher: Driver now available:
>> HDOP-B.AGT:45747
>> 14/09/04 12:53:21 INFO ExecutorLauncher: Listen to driver:
>> akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler
>> 14/09/04 12:53:21 INFO ExecutorLauncher: Allocating 3 executors.
>> 14/09/04 12:53:21 INFO YarnAllocationHandler: Will Allocate 3 executor
>> containers, each with 2432 memory
>> 14/09/04 12:53:21 INFO YarnAllocationHandler: Container request (host:
>> Any, priority: 1, capability: <memory:2432, vCores:1>
>> 14/09/04 12:53:21 INFO YarnAllocationHandler: Container request (host:
>> Any, priority: 1, capability: <memory:2432, vCores:1>
>> 14/09/04 12:53:21 INFO YarnAllocationHandler: Container request (host:
>> Any, priority: 1, capability: <memory:2432, vCores:1>
>> 14/09/04 12:53:21 INFO AMRMClientImpl: Received new token for :
>> HDOP-M.AGT:45454
>> 14/09/04 12:53:21 INFO AMRMClientImpl: Received new token for :
>> HDOP-N1.AGT:45454
>> 14/09/04 12:53:21 INFO RackResolver: Resolved HDOP-N1.AGT to /default-rack
>> 14/09/04 12:53:21 INFO RackResolver: Resolved HDOP-M.AGT to /default-rack
>> 14/09/04 12:53:21 INFO YarnAllocationHandler: Launching container
>> container_1409805761292_0005_01_000003 for on host HDOP-N1.AGT
>> 14/09/04 12:53:21 INFO YarnAllocationHandler: Launching ExecutorRunnable.
>> driverUrl: akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler,
>> executorHostname: HDOP-N1.AGT
>> 14/09/04 12:53:21 INFO YarnAllocationHandler: Launching container
>> container_1409805761292_0005_01_000002 for on host HDOP-M.AGT
>> 14/09/04 12:53:21 INFO ExecutorRunnable: Starting Executor Container
>> 14/09/04 12:53:21 INFO YarnAllocationHandler: Launching ExecutorRunnable.
>> driverUrl: akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler,
>> executorHostname: HDOP-M.AGT
>> 14/09/04 12:53:21 INFO ExecutorRunnable: Starting Executor Container
>> 14/09/04 12:53:21 INFO ContainerManagementProtocolProxy:
>> yarn.client.max-nodemanagers-proxies : 500
>> 14/09/04 12:53:21 INFO ContainerManagementProtocolProxy:
>> yarn.client.max-nodemanagers-proxies : 500
>> 14/09/04 12:53:21 INFO ExecutorRunnable: Setting up ContainerLaunchContext
>> 14/09/04 12:53:21 INFO ExecutorRunnable: Setting up ContainerLaunchContext
>> 14/09/04 12:53:21 INFO ExecutorRunnable: Preparing Local resources
>> 14/09/04 12:53:21 INFO ExecutorRunnable: Preparing Local resources
>> 14/09/04 12:53:21 INFO ExecutorRunnable: Prepared Local resources
>> Map(pi.py -> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020 file:
>> "/user/root/.sparkStaging/application_1409805761292_0005/pi.py" } size: 1317
>> timestamp: 1409806397200 type: FILE visibility: PRIVATE, __spark__.jar ->
>> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020 file:
>> "/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar"
>> } size: 121759562 timestamp: 1409806397057 type: FILE visibility: PRIVATE)
>> 14/09/04 12:53:21 INFO ExecutorRunnable: Prepared Local resources
>> Map(pi.py -> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020 file:
>> "/user/root/.sparkStaging/application_1409805761292_0005/pi.py" } size: 1317
>> timestamp: 1409806397200 type: FILE visibility: PRIVATE, __spark__.jar ->
>> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020 file:
>> "/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar"
>> } size: 121759562 timestamp: 1409806397057 type: FILE visibility: PRIVATE)
>> 14/09/04 12:53:22 INFO ExecutorRunnable: Setting up executor with
>> commands: List($JAVA_HOME/bin/java, -server, -XX:OnOutOfMemoryError='kill
>> %p', -Xms2048m -Xmx2048m , -Djava.io.tmpdir=$PWD/tmp,
>> -Dlog4j.configuration=log4j-spark-container.properties,
>> org.apache.spark.executor.CoarseGrainedExecutorBackend,
>> akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler, 1,
>> HDOP-N1.AGT, 1, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
>> 14/09/04 12:53:22 INFO ExecutorRunnable: Setting up executor with
>> commands: List($JAVA_HOME/bin/java, -server, -XX:OnOutOfMemoryError='kill
>> %p', -Xms2048m -Xmx2048m , -Djava.io.tmpdir=$PWD/tmp,
>> -Dlog4j.configuration=log4j-spark-container.properties,
>> org.apache.spark.executor.CoarseGrainedExecutorBackend,
>> akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler, 2,
>> HDOP-M.AGT, 1, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
>> 14/09/04 12:53:22 INFO ContainerManagementProtocolProxy: Opening proxy :
>> HDOP-N1.AGT:45454
>> 14/09/04 12:53:22 INFO ContainerManagementProtocolProxy: Opening proxy :
>> HDOP-M.AGT:45454
>> 14/09/04 12:53:22 INFO AMRMClientImpl: Received new token for :
>> HDOP-N4.AGT:45454
>> 14/09/04 12:53:22 INFO RackResolver: Resolved HDOP-N4.AGT to /default-rack
>> 14/09/04 12:53:22 INFO YarnAllocationHandler: Launching container
>> container_1409805761292_0005_01_000004 for on host HDOP-N4.AGT
>> 14/09/04 12:53:22 INFO YarnAllocationHandler: Launching ExecutorRunnable.
>> driverUrl: akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler,
>> executorHostname: HDOP-N4.AGT
>> 14/09/04 12:53:22 INFO ExecutorRunnable: Starting Executor Container
>> 14/09/04 12:53:22 INFO ContainerManagementProtocolProxy:
>> yarn.client.max-nodemanagers-proxies : 500
>> 14/09/04 12:53:22 INFO ExecutorRunnable: Setting up ContainerLaunchContext
>> 14/09/04 12:53:22 INFO ExecutorRunnable: Preparing Local resources
>> 14/09/04 12:53:22 INFO ExecutorRunnable: Prepared Local resources
>> Map(pi.py -> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020 file:
>> "/user/root/.sparkStaging/application_1409805761292_0005/pi.py" } size: 1317
>> timestamp: 1409806397200 type: FILE visibility: PRIVATE, __spark__.jar ->
>> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020 file:
>> "/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar"
>> } size: 121759562 timestamp: 1409806397057 type: FILE visibility: PRIVATE)
>> 14/09/04 12:53:22 INFO ExecutorRunnable: Setting up executor with
>> commands: List($JAVA_HOME/bin/java, -server, -XX:OnOutOfMemoryError='kill
>> %p', -Xms2048m -Xmx2048m , -Djava.io.tmpdir=$PWD/tmp,
>> -Dlog4j.configuration=log4j-spark-container.properties,
>> org.apache.spark.executor.CoarseGrainedExecutorBackend,
>> akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler, 3,
>> HDOP-N4.AGT, 1, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
>> 14/09/04 12:53:22 INFO ContainerManagementProtocolProxy: Opening proxy :
>> HDOP-N4.AGT:45454
>> 14/09/04 12:53:22 INFO ExecutorLauncher: All executors have launched.
>> 14/09/04 12:53:22 INFO ExecutorLauncher: Started progress reporter thread
>> - sleep time : 5000
>> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>> disconnected! Shutting down. Disassociated
>> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>> [akka.tcp://spark@HDOP-B.AGT:45747]
>> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>> disconnected! Shutting down. Disassociated
>> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>> [akka.tcp://spark@HDOP-B.AGT:45747]
>> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>> disconnected! Shutting down. Disassociated
>> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>> [akka.tcp://spark@HDOP-B.AGT:45747]
>> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>> disconnected! Shutting down. Disassociated
>> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>> [akka.tcp://spark@HDOP-B.AGT:45747]
>> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>> disconnected! Shutting down. Disassociated
>> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>> [akka.tcp://spark@HDOP-B.AGT:45747]
>> 14/09/04 12:54:02 INFO ExecutorLauncher: finish ApplicationMaster with
>> SUCCEEDED
>> 14/09/04 12:54:02 INFO AMRMClientImpl: Waiting for application to be
>> successfully unregistered.
>> 14/09/04 12:54:02 INFO ExecutorLauncher: Exited
>>
>>
>>
>>
>>
>> Exception still occur:
>>
>>
>>
>>   [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>> ./bin/spark-submit --master yarn  --num-executors 3  --driver-memory 4g
>> --executor-memory 2g --executor-cores 1   examples/src/main/python/pi.py
>> 1000
>> /usr/jdk64/jdk1.7.0_45/bin/java
>>
>> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar:/etc/hadoop/conf
>> -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
>> 14/09/04 12:53:11 INFO spark.SecurityManager: Changing view acls to: root
>> 14/09/04 12:53:11 INFO spark.SecurityManager: SecurityManager:
>> authentication disabled; ui acls disabled; users with view permissions:
>> Set(root)
>> 14/09/04 12:53:12 INFO slf4j.Slf4jLogger: Slf4jLogger started
>> 14/09/04 12:53:12 INFO Remoting: Starting remoting
>> 14/09/04 12:53:12 INFO Remoting: Remoting started; listening on addresses
>> :[akka.tcp://spark@HDOP-B.AGT:45747]
>> 14/09/04 12:53:12 INFO Remoting: Remoting now listens on addresses:
>> [akka.tcp://spark@HDOP-B.AGT:45747]
>> 14/09/04 12:53:12 INFO spark.SparkEnv: Registering MapOutputTracker
>> 14/09/04 12:53:12 INFO spark.SparkEnv: Registering BlockManagerMaster
>> 14/09/04 12:53:12 INFO storage.DiskBlockManager: Created local directory
>> at /tmp/spark-local-20140904125312-c7ea
>> 14/09/04 12:53:12 INFO storage.MemoryStore: MemoryStore started with
>> capacity 2.3 GB.
>> 14/09/04 12:53:12 INFO network.ConnectionManager: Bound socket to port
>> 37363 with id = ConnectionManagerId(HDOP-B.AGT,37363)
>> 14/09/04 12:53:12 INFO storage.BlockManagerMaster: Trying to register
>> BlockManager
>> 14/09/04 12:53:12 INFO storage.BlockManagerInfo: Registering block manager
>> HDOP-B.AGT:37363 with 2.3 GB RAM
>> 14/09/04 12:53:12 INFO storage.BlockManagerMaster: Registered BlockManager
>> 14/09/04 12:53:12 INFO spark.HttpServer: Starting HTTP Server
>> 14/09/04 12:53:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
>> 14/09/04 12:53:12 INFO server.AbstractConnector: Started
>> SocketConnector@0.0.0.0:33547
>> 14/09/04 12:53:12 INFO broadcast.HttpBroadcast: Broadcast server started
>> at http://10.193.1.76:33547
>> 14/09/04 12:53:12 INFO spark.HttpFileServer: HTTP File server directory is
>> /tmp/spark-054f4eda-b93b-47d3-87d5-c40e81fc1fe8
>> 14/09/04 12:53:12 INFO spark.HttpServer: Starting HTTP Server
>> 14/09/04 12:53:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
>> 14/09/04 12:53:12 INFO server.AbstractConnector: Started
>> SocketConnector@0.0.0.0:54594
>> 14/09/04 12:53:13 INFO server.Server: jetty-8.y.z-SNAPSHOT
>> 14/09/04 12:53:13 INFO server.AbstractConnector: Started
>> SelectChannelConnector@0.0.0.0:4040
>> 14/09/04 12:53:13 INFO ui.SparkUI: Started SparkUI at
>> http://HDOP-B.AGT:4040
>> 14/09/04 12:53:13 WARN util.NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> --args is deprecated. Use --arg instead.
>> 14/09/04 12:53:14 INFO client.RMProxy: Connecting to ResourceManager at
>> HDOP-N1.AGT/10.193.1.72:8050
>> 14/09/04 12:53:14 INFO yarn.Client: Got Cluster metric info from
>> ApplicationsManager (ASM), number of NodeManagers: 6
>> 14/09/04 12:53:14 INFO yarn.Client: Queue info ... queueName: default,
>> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>>       queueApplicationCount = 0, queueChildQueueCount = 0
>> 14/09/04 12:53:14 INFO yarn.Client: Max mem capabililty of a single
>> resource in this cluster 13824
>> 14/09/04 12:53:14 INFO yarn.Client: Preparing Local resources
>> 14/09/04 12:53:15 INFO yarn.Client: Uploading
>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>> to
>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>> 14/09/04 12:53:17 INFO yarn.Client: Uploading
>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>> to
>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409805761292_0005/pi.py
>> 14/09/04 12:53:17 INFO yarn.Client: Setting up the launch environment
>> 14/09/04 12:53:17 INFO yarn.Client: Setting up container launch context
>> 14/09/04 12:53:17 INFO yarn.Client: Command for starting the Spark
>> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
>> -Djava.io.tmpdir=$PWD/tmp,
>> -Dspark.tachyonStore.folderName=\"spark-2b59c845-3de2-4c3d-a352-1379ecade281\",
>> -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
>> -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
>> -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
>> -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\",
>> -Dspark.fileserver.uri=\"http://10.193.1.76:54594\",
>> -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"45747\",
>> -Dspark.executor.cores=\"1\",
>> -Dspark.httpBroadcast.uri=\"http://10.193.1.76:33547\",
>> -Dlog4j.configuration=log4j-spark-container.properties,
>> org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused, --jar ,
>> null,  --args  'HDOP-B.AGT:45747' , --executor-memory, 2048,
>> --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
>> <LOG_DIR>/stderr)
>> 14/09/04 12:53:17 INFO yarn.Client: Submitting application to ASM
>> 14/09/04 12:53:17 INFO impl.YarnClientImpl: Submitted application
>> application_1409805761292_0005
>> 14/09/04 12:53:17 INFO cluster.YarnClientSchedulerBackend: Application
>> report from ASM:
>> appMasterRpcPort: -1
>> appStartTime: 1409806397305
>> yarnAppState: ACCEPTED
>>
>> 14/09/04 12:53:18 INFO cluster.YarnClientSchedulerBackend: Application
>> report from ASM:
>> appMasterRpcPort: -1
>> appStartTime: 1409806397305
>> yarnAppState: ACCEPTED
>>
>> 14/09/04 12:53:19 INFO cluster.YarnClientSchedulerBackend: Application
>> report from ASM:
>> appMasterRpcPort: -1
>> appStartTime: 1409806397305
>> yarnAppState: ACCEPTED
>>
>> 14/09/04 12:53:20 INFO cluster.YarnClientSchedulerBackend: Application
>> report from ASM:
>> appMasterRpcPort: -1
>> appStartTime: 1409806397305
>> yarnAppState: ACCEPTED
>>
>> 14/09/04 12:53:21 INFO cluster.YarnClientSchedulerBackend: Application
>> report from ASM:
>> appMasterRpcPort: 0
>> appStartTime: 1409806397305
>> yarnAppState: RUNNING
>>
>> 14/09/04 12:53:23 INFO cluster.YarnClientClusterScheduler:
>> YarnClientClusterScheduler.postStartHook done
>> 14/09/04 12:53:26 INFO cluster.YarnClientSchedulerBackend: Registered
>> executor:
>> Actor[akka.tcp://sparkExecutor@HDOP-N1.AGT:40024/user/Executor#2065794895]
>> with ID 1
>> 14/09/04 12:53:26 INFO storage.BlockManagerInfo: Registering block manager
>> HDOP-N1.AGT:34857 with 1178.1 MB RAM
>> 14/09/04 12:53:26 INFO cluster.YarnClientSchedulerBackend: Registered
>> executor:
>> Actor[akka.tcp://sparkExecutor@HDOP-N4.AGT:49234/user/Executor#820272849]
>> with ID 3
>> 14/09/04 12:53:27 INFO cluster.YarnClientSchedulerBackend: Registered
>> executor:
>> Actor[akka.tcp://sparkExecutor@HDOP-M.AGT:38124/user/Executor#715249825]
>> with ID 2
>> 14/09/04 12:53:27 INFO storage.BlockManagerInfo: Registering block manager
>> HDOP-N4.AGT:43365 with 1178.1 MB RAM
>> 14/09/04 12:53:27 INFO storage.BlockManagerInfo: Registering block manager
>> HDOP-M.AGT:45711 with 1178.1 MB RAM
>> 14/09/04 12:53:55 INFO spark.SparkContext: Starting job: reduce at
>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Got job 0 (reduce at
>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>> with 1000 output partitions (allowLocal=false)
>> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Final stage: Stage 0(reduce
>> at
>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Parents of final stage:
>> List()
>> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Missing parents: List()
>> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Submitting Stage 0
>> (PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing parents
>> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Submitting 1000 missing
>> tasks from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
>> 14/09/04 12:53:55 INFO cluster.YarnClientClusterScheduler: Adding task set
>> 0.0 with 1000 tasks
>> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
>> TID 0 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
>> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
>> 369810 bytes in 5 ms
>> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
>> TID 1 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
>> 506275 bytes in 2 ms
>> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
>> TID 2 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
>> 501135 bytes in 2 ms
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
>> TID 3 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
>> 506275 bytes in 5 ms
>> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 1 (task 0.0:1)
>> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Loss was due to
>> org.apache.spark.api.python.PythonException
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> line 77, in main
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 191, in dump_stream
>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 123, in dump_stream
>>     for obj in iterator:
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 180, in _batched
>>     for item in iterator:
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 612, in func
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 36, in f
>> SystemError: unknown opcode
>>
>> at
>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>> at
>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
>> TID 4 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
>> 506275 bytes in 5 ms
>> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 2 (task 0.0:2)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> line 77, in main
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 191, in dump_stream
>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 123, in dump_stream
>>     for obj in iterator:
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 180, in _batched
>>     for item in iterator:
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 612, in func
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 36, in f
>> SystemError: unknown opcode
>>  [duplicate 1]
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
>> TID 5 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
>> 501135 bytes in 5 ms
>> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 3 (task 0.0:3)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> line 77, in main
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 191, in dump_stream
>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 123, in dump_stream
>>     for obj in iterator:
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 180, in _batched
>>     for item in iterator:
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 612, in func
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 36, in f
>> SystemError: unknown opcode
>>  [duplicate 2]
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
>> TID 6 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
>> 506275 bytes in 5 ms
>> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> line 77, in main
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 191, in dump_stream
>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 123, in dump_stream
>>     for obj in iterator:
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 180, in _batched
>>     for item in iterator:
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 612, in func
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 36, in f
>> SystemError: unknown opcode
>>  [duplicate 3]
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
>> TID 7 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
>> 369810 bytes in 4 ms
>> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 5 (task 0.0:2)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> line 77, in main
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 191, in dump_stream
>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 123, in dump_stream
>>     for obj in iterator:
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 180, in _batched
>>     for item in iterator:
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 612, in func
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 36, in f
>> SystemError: unknown opcode
>>  [duplicate 4]
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
>> TID 8 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
>> 501135 bytes in 3 ms
>> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 4 (task 0.0:1)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> line 77, in main
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 191, in dump_stream
>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 123, in dump_stream
>>     for obj in iterator:
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 180, in _batched
>>     for item in iterator:
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 612, in func
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 36, in f
>> SystemError: unknown opcode
>>  [duplicate 5]
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
>> TID 9 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
>> 506275 bytes in 4 ms
>> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 6 (task 0.0:3)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> line 77, in main
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 191, in dump_stream
>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 123, in dump_stream
>>     for obj in iterator:
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 180, in _batched
>>     for item in iterator:
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 612, in func
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 36, in f
>> SystemError: unknown opcode
>>  [duplicate 6]
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
>> TID 10 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
>> 506275 bytes in 3 ms
>> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 7 (task 0.0:0)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> line 77, in main
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 191, in dump_stream
>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 123, in dump_stream
>>     for obj in iterator:
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 180, in _batched
>>     for item in iterator:
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 612, in func
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 36, in f
>> SystemError: unknown opcode
>>  [duplicate 7]
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
>> TID 11 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
>> 369810 bytes in 3 ms
>> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 8 (task 0.0:2)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> line 77, in main
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 191, in dump_stream
>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 123, in dump_stream
>>     for obj in iterator:
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 180, in _batched
>>     for item in iterator:
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 612, in func
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 36, in f
>> SystemError: unknown opcode
>>  [duplicate 8]
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
>> TID 12 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2 as
>> 501135 bytes in 4 ms
>> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 10 (task 0.0:3)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> line 77, in main
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 191, in dump_stream
>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 123, in dump_stream
>>     for obj in iterator:
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 180, in _batched
>>     for item in iterator:
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 612, in func
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 36, in f
>> SystemError: unknown opcode
>>  [duplicate 9]
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
>> TID 13 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3 as
>> 506275 bytes in 3 ms
>> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 9 (task 0.0:1)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> line 77, in main
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 191, in dump_stream
>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 123, in dump_stream
>>     for obj in iterator:
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 180, in _batched
>>     for item in iterator:
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 612, in func
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 36, in f
>> SystemError: unknown opcode
>>  [duplicate 10]
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
>> TID 14 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as
>> 506275 bytes in 4 ms
>> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 11 (task 0.0:0)
>> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> line 77, in main
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 191, in dump_stream
>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 123, in dump_stream
>>     for obj in iterator:
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 180, in _batched
>>     for item in iterator:
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 612, in func
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 36, in f
>> SystemError: unknown opcode
>>  [duplicate 11]
>> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
>> TID 15 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as
>> 369810 bytes in 4 ms
>> 14/09/04 12:53:57 WARN scheduler.TaskSetManager: Lost TID 12 (task 0.0:2)
>> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Loss was due to
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> line 77, in main
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 191, in dump_stream
>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 123, in dump_stream
>>     for obj in iterator:
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 180, in _batched
>>     for item in iterator:
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 612, in func
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 36, in f
>> SystemError: unknown opcode
>>  [duplicate 12]
>> 14/09/04 12:53:57 ERROR scheduler.TaskSetManager: Task 0.0:2 failed 4
>> times; aborting job
>> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Loss was due to
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> line 77, in main
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 191, in dump_stream
>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 123, in dump_stream
>>     for obj in iterator:
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 180, in _batched
>>     for item in iterator:
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 612, in func
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 36, in f
>> SystemError: unknown opcode
>>  [duplicate 13]
>> 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Cancelling
>> stage 0
>> 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Stage 0 was
>> cancelled
>> 14/09/04 12:53:57 INFO scheduler.DAGScheduler: Failed to run reduce at
>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>> Traceback (most recent call last):
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 38, in <module>
>>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 619, in reduce
>>     vals = self.mapPartitions(func).collect()
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 583, in collect
>>     bytesInJava = self._jrdd.collect().iterator()
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>> line 537, in __call__
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>> line 300, in get_return_value
>> py4j.protocol.Py4JJavaError14/09/04 12:53:57 INFO
>> scheduler.TaskSetManager: Loss was due to
>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> line 77, in main
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 191, in dump_stream
>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 123, in dump_stream
>>     for obj in iterator:
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 180, in _batched
>>     for item in iterator:
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 612, in func
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 36, in f
>> SystemError: unknown opcode
>>  [duplicate 14]
>> 14/09/04 12:53:57 WARN scheduler.TaskSetManager: Loss was due to
>> org.apache.spark.TaskKilledException
>> org.apache.spark.TaskKilledException
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>> : An error occurred while calling o24.collect.
>> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
>> 0.0:2 failed 4 times, most recent failure: Exception failure in TID 12 on
>> host HDOP-M.AGT: org.apache.spark.api.python.PythonException: Traceback
>> (most recent call last):
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> line 77, in main
>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 191, in dump_stream
>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 123, in dump_stream
>>     for obj in iterator:
>>   File
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> line 180, in _batched
>>     for item in iterator:
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> line 612, in func
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 36, in f
>> SystemError: unknown opcode
>>
>>
>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>
>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>         org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>>
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>         java.lang.Thread.run(Thread.java:744)
>> Driver stacktrace:
>> at
>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>> at
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> at
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>> at
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>> at scala.Option.foreach(Option.scala:236)
>> at
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>> at
>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>> at
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> at
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> at
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>
>> 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Removed TaskSet
>> 0.0, whose tasks have all completed, from pool
>>
>>
>>
>>
>> What other procedure can be done for fixing the problem.
>>
>>
>> Thanks
>>
>> Oleg.
>>
>>
>>
>>
>>
>> On Thu, Sep 4, 2014 at 5:36 AM, Andrew Or <andrew@databricks.com> wrote:
>>>
>>> Hi Oleg,
>>>
>>> Your configuration looks alright to me. I haven't seen an "unknown
>>> opcode" System.error before in PySpark. This usually means you have
>>> corrupted .pyc files lying around (ones that belonged to an old python
>>> version, perhaps). What python version are you using? Are all your nodes
>>> running the same version of python? What happens if you just run bin/pyspark
>>> with the same command line arguments, and then do an
>>> "sc.parallelize(range(10)).count()", does it still fail?
>>>
>>> Andrew
>>>
>>>
>>> 2014-09-02 23:42 GMT-07:00 Oleg Ruchovets <oruchovets@gmail.com>:
>>>>
>>>> Hi I changed master to yarn but execution failed with exception again. I
>>>> am using PySpark.
>>>>
>>>> [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>>>> ./bin/spark-submit --master yarn  --num-executors 3  --driver-memory 4g
>>>> --executor-memory 2g --executor-cores 1   examples/src/main/python/pi.py
>>>> 1000
>>>> /usr/jdk64/jdk1.7.0_45/bin/java
>>>>
>>>> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar:/etc/hadoop/conf
>>>> -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
>>>> 14/09/03 14:35:11 INFO spark.SecurityManager: Changing view acls to:
>>>> root
>>>> 14/09/03 14:35:11 INFO spark.SecurityManager: SecurityManager:
>>>> authentication disabled; ui acls disabled; users with view permissions:
>>>> Set(root)
>>>> 14/09/03 14:35:11 INFO slf4j.Slf4jLogger: Slf4jLogger started
>>>> 14/09/03 14:35:11 INFO Remoting: Starting remoting
>>>> 14/09/03 14:35:12 INFO Remoting: Remoting started; listening on
>>>> addresses :[akka.tcp://spark@HDOP-B.AGT:51707]
>>>> 14/09/03 14:35:12 INFO Remoting: Remoting now listens on addresses:
>>>> [akka.tcp://spark@HDOP-B.AGT:51707]
>>>> 14/09/03 14:35:12 INFO spark.SparkEnv: Registering MapOutputTracker
>>>> 14/09/03 14:35:12 INFO spark.SparkEnv: Registering BlockManagerMaster
>>>> 14/09/03 14:35:12 INFO storage.DiskBlockManager: Created local directory
>>>> at /tmp/spark-local-20140903143512-5aab
>>>> 14/09/03 14:35:12 INFO storage.MemoryStore: MemoryStore started with
>>>> capacity 2.3 GB.
>>>> 14/09/03 14:35:12 INFO network.ConnectionManager: Bound socket to port
>>>> 53216 with id = ConnectionManagerId(HDOP-B.AGT,53216)
>>>> 14/09/03 14:35:12 INFO storage.BlockManagerMaster: Trying to register
>>>> BlockManager
>>>> 14/09/03 14:35:12 INFO storage.BlockManagerInfo: Registering block
>>>> manager HDOP-B.AGT:53216 with 2.3 GB RAM
>>>> 14/09/03 14:35:12 INFO storage.BlockManagerMaster: Registered
>>>> BlockManager
>>>> 14/09/03 14:35:12 INFO spark.HttpServer: Starting HTTP Server
>>>> 14/09/03 14:35:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>>> 14/09/03 14:35:12 INFO server.AbstractConnector: Started
>>>> SocketConnector@0.0.0.0:50624
>>>> 14/09/03 14:35:12 INFO broadcast.HttpBroadcast: Broadcast server started
>>>> at http://10.193.1.76:50624
>>>> 14/09/03 14:35:12 INFO spark.HttpFileServer: HTTP File server directory
>>>> is /tmp/spark-fd7fdcb2-f45d-430f-95fa-afbc4f329b43
>>>> 14/09/03 14:35:12 INFO spark.HttpServer: Starting HTTP Server
>>>> 14/09/03 14:35:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>>> 14/09/03 14:35:12 INFO server.AbstractConnector: Started
>>>> SocketConnector@0.0.0.0:41773
>>>> 14/09/03 14:35:13 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>>> 14/09/03 14:35:13 INFO server.AbstractConnector: Started
>>>> SelectChannelConnector@0.0.0.0:4040
>>>> 14/09/03 14:35:13 INFO ui.SparkUI: Started SparkUI at
>>>> http://HDOP-B.AGT:4040
>>>> 14/09/03 14:35:13 WARN util.NativeCodeLoader: Unable to load
>>>> native-hadoop library for your platform... using builtin-java classes where
>>>> applicable
>>>> --args is deprecated. Use --arg instead.
>>>> 14/09/03 14:35:14 INFO client.RMProxy: Connecting to ResourceManager at
>>>> HDOP-N1.AGT/10.193.1.72:8050
>>>> 14/09/03 14:35:14 INFO yarn.Client: Got Cluster metric info from
>>>> ApplicationsManager (ASM), number of NodeManagers: 6
>>>> 14/09/03 14:35:14 INFO yarn.Client: Queue info ... queueName: default,
>>>> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>>>>       queueApplicationCount = 0, queueChildQueueCount = 0
>>>> 14/09/03 14:35:14 INFO yarn.Client: Max mem capabililty of a single
>>>> resource in this cluster 13824
>>>> 14/09/03 14:35:14 INFO yarn.Client: Preparing Local resources
>>>> 14/09/03 14:35:14 INFO yarn.Client: Uploading
>>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>>> to
>>>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0036/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>>> 14/09/03 14:35:16 INFO yarn.Client: Uploading
>>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>>> to
>>>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0036/pi.py
>>>> 14/09/03 14:35:16 INFO yarn.Client: Setting up the launch environment
>>>> 14/09/03 14:35:16 INFO yarn.Client: Setting up container launch context
>>>> 14/09/03 14:35:16 INFO yarn.Client: Command for starting the Spark
>>>> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
>>>> -Djava.io.tmpdir=$PWD/tmp,
>>>> -Dspark.tachyonStore.folderName=\"spark-98b7d323-2faf-419a-a88d-1a0c549dc5d4\",
>>>> -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
>>>> -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
>>>> -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
>>>> -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\",
>>>> -Dspark.fileserver.uri=\"http://10.193.1.76:41773\",
>>>> -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"51707\",
>>>> -Dspark.executor.cores=\"1\",
>>>> -Dspark.httpBroadcast.uri=\"http://10.193.1.76:50624\",
>>>> -Dlog4j.configuration=log4j-spark-container.properties,
>>>> org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused, --jar ,
>>>> null,  --args  'HDOP-B.AGT:51707' , --executor-memory, 2048,
>>>> --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
>>>> <LOG_DIR>/stderr)
>>>> 14/09/03 14:35:16 INFO yarn.Client: Submitting application to ASM
>>>> 14/09/03 14:35:16 INFO impl.YarnClientImpl: Submitted application
>>>> application_1409559972905_0036
>>>> 14/09/03 14:35:16 INFO cluster.YarnClientSchedulerBackend: Application
>>>> report from ASM:
>>>> appMasterRpcPort: -1
>>>> appStartTime: 1409726116517
>>>> yarnAppState: ACCEPTED
>>>>
>>>> 14/09/03 14:35:17 INFO cluster.YarnClientSchedulerBackend: Application
>>>> report from ASM:
>>>> appMasterRpcPort: -1
>>>> appStartTime: 1409726116517
>>>> yarnAppState: ACCEPTED
>>>>
>>>> 14/09/03 14:35:18 INFO cluster.YarnClientSchedulerBackend: Application
>>>> report from ASM:
>>>> appMasterRpcPort: -1
>>>> appStartTime: 1409726116517
>>>> yarnAppState: ACCEPTED
>>>>
>>>> 14/09/03 14:35:19 INFO cluster.YarnClientSchedulerBackend: Application
>>>> report from ASM:
>>>> appMasterRpcPort: -1
>>>> appStartTime: 1409726116517
>>>> yarnAppState: ACCEPTED
>>>>
>>>> 14/09/03 14:35:20 INFO cluster.YarnClientSchedulerBackend: Application
>>>> report from ASM:
>>>> appMasterRpcPort: -1
>>>> appStartTime: 1409726116517
>>>> yarnAppState: ACCEPTED
>>>>
>>>> 14/09/03 14:35:21 INFO cluster.YarnClientSchedulerBackend: Application
>>>> report from ASM:
>>>> appMasterRpcPort: -1
>>>> appStartTime: 1409726116517
>>>> yarnAppState: ACCEPTED
>>>>
>>>> 14/09/03 14:35:22 INFO cluster.YarnClientSchedulerBackend: Application
>>>> report from ASM:
>>>> appMasterRpcPort: 0
>>>> appStartTime: 1409726116517
>>>> yarnAppState: RUNNING
>>>>
>>>> 14/09/03 14:35:24 INFO cluster.YarnClientClusterScheduler:
>>>> YarnClientClusterScheduler.postStartHook done
>>>> 14/09/03 14:35:25 INFO cluster.YarnClientSchedulerBackend: Registered
>>>> executor:
>>>> Actor[akka.tcp://sparkExecutor@HDOP-B.AGT:58976/user/Executor#-1831707618]
>>>> with ID 1
>>>> 14/09/03 14:35:26 INFO storage.BlockManagerInfo: Registering block
>>>> manager HDOP-B.AGT:44142 with 1178.1 MB RAM
>>>> 14/09/03 14:35:26 INFO cluster.YarnClientSchedulerBackend: Registered
>>>> executor:
>>>> Actor[akka.tcp://sparkExecutor@HDOP-N1.AGT:45140/user/Executor#875812337]
>>>> with ID 2
>>>> 14/09/03 14:35:26 INFO storage.BlockManagerInfo: Registering block
>>>> manager HDOP-N1.AGT:48513 with 1178.1 MB RAM
>>>> 14/09/03 14:35:26 INFO cluster.YarnClientSchedulerBackend: Registered
>>>> executor:
>>>> Actor[akka.tcp://sparkExecutor@HDOP-N3.AGT:45380/user/Executor#1559437246]
>>>> with ID 3
>>>> 14/09/03 14:35:27 INFO storage.BlockManagerInfo: Registering block
>>>> manager HDOP-N3.AGT:46616 with 1178.1 MB RAM
>>>> 14/09/03 14:35:56 INFO spark.SparkContext: Starting job: reduce at
>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Got job 0 (reduce at
>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>>>> with 1000 output partitions (allowLocal=false)
>>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Final stage: Stage
>>>> 0(reduce at
>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Parents of final stage:
>>>> List()
>>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Missing parents: List()
>>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Submitting Stage 0
>>>> (PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing parents
>>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Submitting 1000 missing
>>>> tasks from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
>>>> 14/09/03 14:35:56 INFO cluster.YarnClientClusterScheduler: Adding task
>>>> set 0.0 with 1000 tasks
>>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
>>>> TID 0 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Serialized task 0.0:0
>>>> as 369811 bytes in 9 ms
>>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
>>>> TID 1 on executor 3: HDOP-N3.AGT (PROCESS_LOCAL)
>>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1
>>>> as 506276 bytes in 5 ms
>>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
>>>> TID 2 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:2
>>>> as 501136 bytes in 5 ms
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
>>>> TID 3 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:3
>>>> as 506276 bytes in 4 ms
>>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 2 (task 0.0:2)
>>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Loss was due to
>>>> org.apache.spark.api.python.PythonException
>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>> last):
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>> line 77, in main
>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 191, in dump_stream
>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 123, in dump_stream
>>>>     for obj in iterator:
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 180, in _batched
>>>>     for item in iterator:
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>> line 612, in func
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 36, in f
>>>> SystemError: unknown opcode
>>>>
>>>> at
>>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>>> at
>>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>>> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:744)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
>>>> TID 4 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:2
>>>> as 501136 bytes in 4 ms
>>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0)
>>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Loss was due to
>>>> org.apache.spark.api.python.PythonException
>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>> last):
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>> line 77, in main
>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 191, in dump_stream
>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 123, in dump_stream
>>>>     for obj in iterator:
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 180, in _batched
>>>>     for item in iterator:
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>> line 612, in func
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 36, in f
>>>> SystemError: unknown opcode
>>>>
>>>> at
>>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>>> at
>>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>>> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:744)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
>>>> TID 5 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:0
>>>> as 369811 bytes in 3 ms
>>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 3 (task 0.0:3)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>> last):
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>> line 77, in main
>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 191, in dump_stream
>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 123, in dump_stream
>>>>     for obj in iterator:
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 180, in _batched
>>>>     for item in iterator:
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>> line 612, in func
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 36, in f
>>>> SystemError: unknown opcode
>>>>  [duplicate 1]
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
>>>> TID 6 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:3
>>>> as 506276 bytes in 4 ms
>>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 4 (task 0.0:2)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>> last):
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>> line 77, in main
>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 191, in dump_stream
>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 123, in dump_stream
>>>>     for obj in iterator:
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 180, in _batched
>>>>     for item in iterator:
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>> line 612, in func
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 36, in f
>>>> SystemError: unknown opcode
>>>>  [duplicate 1]
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
>>>> TID 7 on executor 3: HDOP-N3.AGT (PROCESS_LOCAL)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:2
>>>> as 501136 bytes in 4 ms
>>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 1 (task 0.0:1)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>> last):
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>> line 77, in main
>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 191, in dump_stream
>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 123, in dump_stream
>>>>     for obj in iterator:
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 180, in _batched
>>>>     for item in iterator:
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>> line 612, in func
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 36, in f
>>>> SystemError: unknown opcode
>>>>  [duplicate 2]
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
>>>> TID 8 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:1
>>>> as 506276 bytes in 4 ms
>>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 5 (task 0.0:0)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>> last):
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>> line 77, in main
>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 191, in dump_stream
>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 123, in dump_stream
>>>>     for obj in iterator:
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 180, in _batched
>>>>     for item in iterator:
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>> line 612, in func
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 36, in f
>>>> SystemError: unknown opcode
>>>>  [duplicate 3]
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
>>>> TID 9 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:0
>>>> as 369811 bytes in 4 ms
>>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 6 (task 0.0:3)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>> last):
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>> line 77, in main
>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 191, in dump_stream
>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 123, in dump_stream
>>>>     for obj in iterator:
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 180, in _batched
>>>>     for item in iterator:
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>> line 612, in func
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 36, in f
>>>> SystemError: unknown opcode
>>>>  [duplicate 2]
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
>>>> TID 10 on executor 3: HDOP-N3.AGT (PROCESS_LOCAL)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:3
>>>> as 506276 bytes in 4 ms
>>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 7 (task 0.0:2)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>> last):
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>> line 77, in main
>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 191, in dump_stream
>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 123, in dump_stream
>>>>     for obj in iterator:
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 180, in _batched
>>>>     for item in iterator:
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>> line 612, in func
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 36, in f
>>>> SystemError: unknown opcode
>>>>  [duplicate 4]
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
>>>> TID 11 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:2
>>>> as 501136 bytes in 3 ms
>>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 9 (task 0.0:0)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>> last):
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>> line 77, in main
>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 191, in dump_stream
>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 123, in dump_stream
>>>>     for obj in iterator:
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 180, in _batched
>>>>     for item in iterator:
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>> line 612, in func
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 36, in f
>>>> SystemError: unknown opcode
>>>>  [duplicate 3]
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
>>>> TID 12 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task 0.0:0
>>>> as 369811 bytes in 4 ms
>>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 8 (task 0.0:1)
>>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>> last):
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>> line 77, in main
>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 191, in dump_stream
>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 123, in dump_stream
>>>>     for obj in iterator:
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 180, in _batched
>>>>     for item in iterator:
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>> line 612, in func
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 36, in f
>>>> SystemError: unknown opcode
>>>>  [duplicate 5]
>>>> 14/09/03 14:35:58 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
>>>> TID 13 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>>> 14/09/03 14:35:58 INFO scheduler.TaskSetManager: Serialized task 0.0:1
>>>> as 506276 bytes in 3 ms
>>>> 14/09/03 14:35:58 WARN scheduler.TaskSetManager: Lost TID 11 (task
>>>> 0.0:2)
>>>> 14/09/03 14:35:58 INFO scheduler.TaskSetManager: Loss was due to
>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>> last):
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>> line 77, in main
>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 191, in dump_stream
>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 123, in dump_stream
>>>>     for obj in iterator:
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 180, in _batched
>>>>     for item in iterator:
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>> line 612, in func
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 36, in f
>>>> SystemError: unknown opcode
>>>>  [duplicate 4]
>>>> 14/09/03 14:35:58 ERROR scheduler.TaskSetManager: Task 0.0:2 failed 4
>>>> times; aborting job
>>>> 14/09/03 14:35:58 INFO cluster.YarnClientClusterScheduler: Cancelling
>>>> stage 0
>>>> 14/09/03 14:35:58 INFO cluster.YarnClientClusterScheduler: Stage 0 was
>>>> cancelled
>>>> 14/09/03 14:35:58 INFO scheduler.TaskSetManager: Loss was due to
>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>> last):
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>> line 77, in main
>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 191, in dump_stream
>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 123, in dump_stream
>>>>     for obj in iterator:
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 180, in _batched
>>>>     for item in iterator:
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>> line 612, in func
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 36, in f
>>>> SystemError: unknown opcode
>>>>  [duplicate 6]
>>>> 14/09/03 14:35:58 INFO scheduler.DAGScheduler: Failed to run reduce at
>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>>>> Traceback (most recent call last):
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 38, in <module>
>>>>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>> line 619, in reduce
>>>>     vals = self.mapPartitions(func).collect()
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>> line 583, in collect
>>>>     bytesInJava = self._jrdd.collect().iterator()
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>>>> line 537, in __call__
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>>>> line 300, in get_return_value
>>>> py4j.protocol.Py4JJavaError14/09/03 14:35:58 INFO
>>>> scheduler.TaskSetManager: Loss was due to
>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>> last):
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>> line 77, in main
>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 191, in dump_stream
>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 123, in dump_stream
>>>>     for obj in iterator:
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 180, in _batched
>>>>     for item in iterator:
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>> line 612, in func
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 36, in f
>>>> SystemError: unknown opcode
>>>>  [duplicate 7]
>>>> : An error occurred while calling o24.collect.
>>>> : org.apache.spark.SparkException: Job aborted due to stage failure:
>>>> Task 0.0:2 failed 4 times, most recent failure: Exception failure in TID 11
>>>> on host HDOP-N1.AGT: org.apache.spark.api.python.PythonException: Traceback
>>>> (most recent call last):
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>> line 77, in main
>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 191, in dump_stream
>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 123, in dump_stream
>>>>     for obj in iterator:
>>>>   File
>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>> line 180, in _batched
>>>>     for item in iterator:
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>> line 612, in func
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 36, in f
>>>> SystemError: unknown opcode
>>>>
>>>>
>>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>>>
>>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>>>
>>>> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>>>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>>
>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>
>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>         java.lang.Thread.run(Thread.java:744)
>>>> Driver stacktrace:
>>>> at
>>>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>>> at
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>>> at
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>>> at
>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>> at
>>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>>> at
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>> at
>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>> at scala.Option.foreach(Option.scala:236)
>>>> at
>>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>>> at
>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>> at
>>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>> at
>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>> at
>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>> at
>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>
>>>> 14/09/03 14:35:58 WARN scheduler.TaskSetManager: Loss was due to
>>>> org.apache.spark.TaskKilledException
>>>> org.apache.spark.TaskKilledException
>>>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> at java.lang.Thread.run(Thread.java:744)
>>>> 14/09/03 14:35:58 INFO cluster.YarnClientClusterScheduler: Removed
>>>> TaskSet 0.0, whose tasks have all completed, from pool
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Sep 3, 2014 at 1:53 PM, Oleg Ruchovets <oruchovets@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hello Sandy , I changed to using yarn master but still got the
>>>>> exceptions:
>>>>>
>>>>> What is the procedure to execute pyspark on yarn? is it required only
>>>>> to attached the command , or it is required to start spark processes also?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>>>>> ./bin/spark-submit --master yarn://HDOP-N1.AGT:8032 --num-executors 3
>>>>> --driver-memory 4g --executor-memory 2g --executor-cores 1
>>>>> examples/src/main/python/pi.py   1000
>>>>> /usr/jdk64/jdk1.7.0_45/bin/java
>>>>>
>>>>> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar:/etc/hadoop/conf
>>>>> -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
>>>>> 14/09/03 13:48:48 INFO spark.SecurityManager: Changing view acls to:
>>>>> root
>>>>> 14/09/03 13:48:48 INFO spark.SecurityManager: SecurityManager:
>>>>> authentication disabled; ui acls disabled; users with view permissions:
>>>>> Set(root)
>>>>> 14/09/03 13:48:49 INFO slf4j.Slf4jLogger: Slf4jLogger started
>>>>> 14/09/03 13:48:49 INFO Remoting: Starting remoting
>>>>> 14/09/03 13:48:49 INFO Remoting: Remoting started; listening on
>>>>> addresses :[akka.tcp://spark@HDOP-B.AGT:34424]
>>>>> 14/09/03 13:48:49 INFO Remoting: Remoting now listens on addresses:
>>>>> [akka.tcp://spark@HDOP-B.AGT:34424]
>>>>> 14/09/03 13:48:49 INFO spark.SparkEnv: Registering MapOutputTracker
>>>>> 14/09/03 13:48:49 INFO spark.SparkEnv: Registering BlockManagerMaster
>>>>> 14/09/03 13:48:49 INFO storage.DiskBlockManager: Created local
>>>>> directory at /tmp/spark-local-20140903134849-231c
>>>>> 14/09/03 13:48:49 INFO storage.MemoryStore: MemoryStore started with
>>>>> capacity 2.3 GB.
>>>>> 14/09/03 13:48:49 INFO network.ConnectionManager: Bound socket to port
>>>>> 60647 with id = ConnectionManagerId(HDOP-B.AGT,60647)
>>>>> 14/09/03 13:48:49 INFO storage.BlockManagerMaster: Trying to register
>>>>> BlockManager
>>>>> 14/09/03 13:48:49 INFO storage.BlockManagerInfo: Registering block
>>>>> manager HDOP-B.AGT:60647 with 2.3 GB RAM
>>>>> 14/09/03 13:48:49 INFO storage.BlockManagerMaster: Registered
>>>>> BlockManager
>>>>> 14/09/03 13:48:49 INFO spark.HttpServer: Starting HTTP Server
>>>>> 14/09/03 13:48:49 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>>>> 14/09/03 13:48:49 INFO server.AbstractConnector: Started
>>>>> SocketConnector@0.0.0.0:56549
>>>>> 14/09/03 13:48:49 INFO broadcast.HttpBroadcast: Broadcast server
>>>>> started at http://10.193.1.76:56549
>>>>> 14/09/03 13:48:49 INFO spark.HttpFileServer: HTTP File server directory
>>>>> is /tmp/spark-90af1222-9ea8-4dd8-887a-343d09d44333
>>>>> 14/09/03 13:48:49 INFO spark.HttpServer: Starting HTTP Server
>>>>> 14/09/03 13:48:49 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>>>> 14/09/03 13:48:49 INFO server.AbstractConnector: Started
>>>>> SocketConnector@0.0.0.0:36512
>>>>> 14/09/03 13:48:50 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>>>> 14/09/03 13:48:50 INFO server.AbstractConnector: Started
>>>>> SelectChannelConnector@0.0.0.0:4040
>>>>> 14/09/03 13:48:50 INFO ui.SparkUI: Started SparkUI at
>>>>> http://HDOP-B.AGT:4040
>>>>> 14/09/03 13:48:50 WARN util.NativeCodeLoader: Unable to load
>>>>> native-hadoop library for your platform... using builtin-java classes where
>>>>> applicable
>>>>> --args is deprecated. Use --arg instead.
>>>>> 14/09/03 13:48:51 INFO client.RMProxy: Connecting to ResourceManager at
>>>>> HDOP-N1.AGT/10.193.1.72:8050
>>>>> 14/09/03 13:48:51 INFO yarn.Client: Got Cluster metric info from
>>>>> ApplicationsManager (ASM), number of NodeManagers: 6
>>>>> 14/09/03 13:48:51 INFO yarn.Client: Queue info ... queueName: default,
>>>>> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>>>>>       queueApplicationCount = 0, queueChildQueueCount = 0
>>>>> 14/09/03 13:48:51 INFO yarn.Client: Max mem capabililty of a single
>>>>> resource in this cluster 13824
>>>>> 14/09/03 13:48:51 INFO yarn.Client: Preparing Local resources
>>>>> 14/09/03 13:48:51 INFO yarn.Client: Uploading
>>>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>>>> to
>>>>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0033/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>>>> 14/09/03 13:48:53 INFO yarn.Client: Uploading
>>>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>>>> to
>>>>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0033/pi.py
>>>>> 14/09/03 13:48:53 INFO yarn.Client: Setting up the launch environment
>>>>> 14/09/03 13:48:53 INFO yarn.Client: Setting up container launch context
>>>>> 14/09/03 13:48:53 INFO yarn.Client: Command for starting the Spark
>>>>> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
>>>>> -Djava.io.tmpdir=$PWD/tmp,
>>>>> -Dspark.tachyonStore.folderName=\"spark-bdabb882-a2e0-46b6-8e87-90cc6e359d84\",
>>>>> -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
>>>>> -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
>>>>> -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
>>>>> -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\",
>>>>> -Dspark.fileserver.uri=\"http://10.193.1.76:36512\",
>>>>> -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"34424\",
>>>>> -Dspark.executor.cores=\"1\",
>>>>> -Dspark.httpBroadcast.uri=\"http://10.193.1.76:56549\",
>>>>> -Dlog4j.configuration=log4j-spark-container.properties,
>>>>> org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused, --jar ,
>>>>> null,  --args  'HDOP-B.AGT:34424' , --executor-memory, 2048,
>>>>> --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
>>>>> <LOG_DIR>/stderr)
>>>>> 14/09/03 13:48:53 INFO yarn.Client: Submitting application to ASM
>>>>> 14/09/03 13:48:53 INFO impl.YarnClientImpl: Submitted application
>>>>> application_1409559972905_0033
>>>>> 14/09/03 13:48:53 INFO cluster.YarnClientSchedulerBackend: Application
>>>>> report from ASM:
>>>>> appMasterRpcPort: -1
>>>>> appStartTime: 1409723333584
>>>>> yarnAppState: ACCEPTED
>>>>>
>>>>> 14/09/03 13:48:54 INFO cluster.YarnClientSchedulerBackend: Application
>>>>> report from ASM:
>>>>> appMasterRpcPort: -1
>>>>> appStartTime: 1409723333584
>>>>> yarnAppState: ACCEPTED
>>>>>
>>>>> 14/09/03 13:48:55 INFO cluster.YarnClientSchedulerBackend: Application
>>>>> report from ASM:
>>>>> appMasterRpcPort: -1
>>>>> appStartTime: 1409723333584
>>>>> yarnAppState: ACCEPTED
>>>>>
>>>>> 14/09/03 13:48:56 INFO cluster.YarnClientSchedulerBackend: Application
>>>>> report from ASM:
>>>>> appMasterRpcPort: -1
>>>>> appStartTime: 1409723333584
>>>>> yarnAppState: ACCEPTED
>>>>>
>>>>> 14/09/03 13:48:57 INFO cluster.YarnClientSchedulerBackend: Application
>>>>> report from ASM:
>>>>> appMasterRpcPort: -1
>>>>> appStartTime: 1409723333584
>>>>> yarnAppState: ACCEPTED
>>>>>
>>>>> 14/09/03 13:48:58 INFO cluster.YarnClientSchedulerBackend: Application
>>>>> report from ASM:
>>>>> appMasterRpcPort: 0
>>>>> appStartTime: 1409723333584
>>>>> yarnAppState: RUNNING
>>>>>
>>>>> 14/09/03 13:49:00 INFO cluster.YarnClientClusterScheduler:
>>>>> YarnClientClusterScheduler.postStartHook done
>>>>> 14/09/03 13:49:01 INFO cluster.YarnClientSchedulerBackend: Registered
>>>>> executor:
>>>>> Actor[akka.tcp://sparkExecutor@HDOP-B.AGT:57078/user/Executor#1595833626]
>>>>> with ID 1
>>>>> 14/09/03 13:49:02 INFO storage.BlockManagerInfo: Registering block
>>>>> manager HDOP-B.AGT:54579 with 1178.1 MB RAM
>>>>> 14/09/03 13:49:03 INFO cluster.YarnClientSchedulerBackend: Registered
>>>>> executor:
>>>>> Actor[akka.tcp://sparkExecutor@HDOP-N4.AGT:43121/user/Executor#-1266627304]
>>>>> with ID 2
>>>>> 14/09/03 13:49:03 INFO cluster.YarnClientSchedulerBackend: Registered
>>>>> executor:
>>>>> Actor[akka.tcp://sparkExecutor@HDOP-N2.AGT:36952/user/Executor#1003961369]
>>>>> with ID 3
>>>>> 14/09/03 13:49:04 INFO storage.BlockManagerInfo: Registering block
>>>>> manager HDOP-N4.AGT:56891 with 1178.1 MB RAM
>>>>> 14/09/03 13:49:04 INFO storage.BlockManagerInfo: Registering block
>>>>> manager HDOP-N2.AGT:42381 with 1178.1 MB RAM
>>>>> 14/09/03 13:49:33 INFO spark.SparkContext: Starting job: reduce at
>>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>>>>> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Got job 0 (reduce at
>>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>>>>> with 1000 output partitions (allowLocal=false)
>>>>> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Final stage: Stage
>>>>> 0(reduce at
>>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>>>>> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Parents of final stage:
>>>>> List()
>>>>> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Missing parents: List()
>>>>> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Submitting Stage 0
>>>>> (PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing parents
>>>>> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Submitting 1000 missing
>>>>> tasks from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
>>>>> 14/09/03 13:49:33 INFO cluster.YarnClientClusterScheduler: Adding task
>>>>> set 0.0 with 1000 tasks
>>>>> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
>>>>> TID 0 on executor 2: HDOP-N4.AGT (PROCESS_LOCAL)
>>>>> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Serialized task 0.0:0
>>>>> as 369811 bytes in 4 ms
>>>>> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
>>>>> TID 1 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
>>>>> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Serialized task 0.0:1
>>>>> as 506276 bytes in 5 ms
>>>>> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
>>>>> TID 2 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>>>> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Serialized task 0.0:2
>>>>> as 501136 bytes in 5 ms
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
>>>>> TID 3 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:3
>>>>> as 506276 bytes in 5 ms
>>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 2 (task
>>>>> 0.0:2)
>>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Loss was due to
>>>>> org.apache.spark.api.python.PythonException
>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>>>> call last):
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>>> line 77, in main
>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 191, in dump_stream
>>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 123, in dump_stream
>>>>>     for obj in iterator:
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 180, in _batched
>>>>>     for item in iterator:
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 612, in func
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>> line 36, in f
>>>>> SystemError: unknown opcode
>>>>>
>>>>> at
>>>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>>>> at
>>>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>>>> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>> at
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:744)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
>>>>> TID 4 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:2
>>>>> as 501136 bytes in 4 ms
>>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 1 (task
>>>>> 0.0:1)
>>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Loss was due to
>>>>> org.apache.spark.api.python.PythonException
>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>>>> call last):
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>>> line 77, in main
>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 191, in dump_stream
>>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 123, in dump_stream
>>>>>     for obj in iterator:
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 180, in _batched
>>>>>     for item in iterator:
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 612, in func
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>> line 36, in f
>>>>> SystemError: unknown opcode
>>>>>
>>>>> at
>>>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>>>> at
>>>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>>>> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>> at
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:744)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
>>>>> TID 5 on executor 2: HDOP-N4.AGT (PROCESS_LOCAL)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:1
>>>>> as 506276 bytes in 4 ms
>>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 0 (task
>>>>> 0.0:0)
>>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Loss was due to
>>>>> org.apache.spark.api.python.PythonException
>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>>>> call last):
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>>> line 77, in main
>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 191, in dump_stream
>>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 123, in dump_stream
>>>>>     for obj in iterator:
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 180, in _batched
>>>>>     for item in iterator:
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 612, in func
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>> line 36, in f
>>>>> SystemError: unknown opcode
>>>>>
>>>>> at
>>>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>>>> at
>>>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>>>> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>> at
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:744)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
>>>>> TID 6 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:0
>>>>> as 369811 bytes in 4 ms
>>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 3 (task
>>>>> 0.0:3)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>>> last):
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>>> line 77, in main
>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 191, in dump_stream
>>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 123, in dump_stream
>>>>>     for obj in iterator:
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 180, in _batched
>>>>>     for item in iterator:
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 612, in func
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>> line 36, in f
>>>>> SystemError: unknown opcode
>>>>>  [duplicate 1]
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
>>>>> TID 7 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:3
>>>>> as 506276 bytes in 4 ms
>>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 4 (task
>>>>> 0.0:2)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>>> last):
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>>> line 77, in main
>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 191, in dump_stream
>>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 123, in dump_stream
>>>>>     for obj in iterator:
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 180, in _batched
>>>>>     for item in iterator:
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 612, in func
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>> line 36, in f
>>>>> SystemError: unknown opcode
>>>>>  [duplicate 1]
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
>>>>> TID 8 on executor 2: HDOP-N4.AGT (PROCESS_LOCAL)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:2
>>>>> as 501136 bytes in 3 ms
>>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 5 (task
>>>>> 0.0:1)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>>> last):
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>>> line 77, in main
>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 191, in dump_stream
>>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 123, in dump_stream
>>>>>     for obj in iterator:
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 180, in _batched
>>>>>     for item in iterator:
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 612, in func
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>> line 36, in f
>>>>> SystemError: unknown opcode
>>>>>  [duplicate 1]
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
>>>>> TID 9 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:1
>>>>> as 506276 bytes in 4 ms
>>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 6 (task
>>>>> 0.0:0)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>>> last):
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>>> line 77, in main
>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 191, in dump_stream
>>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 123, in dump_stream
>>>>>     for obj in iterator:
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 180, in _batched
>>>>>     for item in iterator:
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 612, in func
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>> line 36, in f
>>>>> SystemError: unknown opcode
>>>>>  [duplicate 2]
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
>>>>> TID 10 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:0
>>>>> as 369811 bytes in 3 ms
>>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 7 (task
>>>>> 0.0:3)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>>> last):
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>>> line 77, in main
>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 191, in dump_stream
>>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 123, in dump_stream
>>>>>     for obj in iterator:
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 180, in _batched
>>>>>     for item in iterator:
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 612, in func
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>> line 36, in f
>>>>> SystemError: unknown opcode
>>>>>  [duplicate 2]
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
>>>>> TID 11 on executor 2: HDOP-N4.AGT (PROCESS_LOCAL)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:3
>>>>> as 506276 bytes in 4 ms
>>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 8 (task
>>>>> 0.0:2)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>>> last):
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>>> line 77, in main
>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 191, in dump_stream
>>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 123, in dump_stream
>>>>>     for obj in iterator:
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 180, in _batched
>>>>>     for item in iterator:
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 612, in func
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>> line 36, in f
>>>>> SystemError: unknown opcode
>>>>>  [duplicate 2]
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
>>>>> TID 12 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task 0.0:2
>>>>> as 501136 bytes in 3 ms
>>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 9 (task
>>>>> 0.0:1)
>>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>>> last):
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>>> line 77, in main
>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 191, in dump_stream
>>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 123, in dump_stream
>>>>>     for obj in iterator:
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 180, in _batched
>>>>>     for item in iterator:
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 612, in func
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>> line 36, in f
>>>>> SystemError: unknown opcode
>>>>>  [duplicate 3]
>>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
>>>>> TID 13 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
>>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Serialized task 0.0:1
>>>>> as 506276 bytes in 4 ms
>>>>> 14/09/03 13:49:35 WARN scheduler.TaskSetManager: Lost TID 10 (task
>>>>> 0.0:0)
>>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Loss was due to
>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>>> last):
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>>> line 77, in main
>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 191, in dump_stream
>>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 123, in dump_stream
>>>>>     for obj in iterator:
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 180, in _batched
>>>>>     for item in iterator:
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 612, in func
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>> line 36, in f
>>>>> SystemError: unknown opcode
>>>>>  [duplicate 3]
>>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
>>>>> TID 14 on executor 2: HDOP-N4.AGT (PROCESS_LOCAL)
>>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Serialized task 0.0:0
>>>>> as 369811 bytes in 4 ms
>>>>> 14/09/03 13:49:35 WARN scheduler.TaskSetManager: Lost TID 11 (task
>>>>> 0.0:3)
>>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Loss was due to
>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>>> last):
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>>> line 77, in main
>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 191, in dump_stream
>>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 123, in dump_stream
>>>>>     for obj in iterator:
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 180, in _batched
>>>>>     for item in iterator:
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 612, in func
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>> line 36, in f
>>>>> SystemError: unknown opcode
>>>>>  [duplicate 3]
>>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
>>>>> TID 15 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
>>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Serialized task 0.0:3
>>>>> as 506276 bytes in 3 ms
>>>>> 14/09/03 13:49:35 WARN scheduler.TaskSetManager: Lost TID 13 (task
>>>>> 0.0:1)
>>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Loss was due to
>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>>> last):
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>>> line 77, in main
>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 191, in dump_stream
>>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 123, in dump_stream
>>>>>     for obj in iterator:
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 180, in _batched
>>>>>     for item in iterator:
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 612, in func
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>> line 36, in f
>>>>> SystemError: unknown opcode
>>>>>  [duplicate 4]
>>>>> 14/09/03 13:49:35 ERROR scheduler.TaskSetManager: Task 0.0:1 failed 4
>>>>> times; aborting job
>>>>> 14/09/03 13:49:35 INFO cluster.YarnClientClusterScheduler: Cancelling
>>>>> stage 0
>>>>> 14/09/03 13:49:35 INFO cluster.YarnClientClusterScheduler: Stage 0 was
>>>>> cancelled
>>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Loss was due to
>>>>> org.apache.spark.api.python.PythonException: Traceback (most recent call
>>>>> last):
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>>> line 77, in main
>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 191, in dump_stream
>>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 123, in dump_stream
>>>>>     for obj in iterator:
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 180, in _batched
>>>>>     for item in iterator:
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 612, in func
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>> line 36, in f
>>>>> SystemError: unknown opcode
>>>>>  [duplicate 4]
>>>>> 14/09/03 13:49:35 INFO scheduler.DAGScheduler: Failed to run reduce at
>>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>>>>> Traceback (most recent call last):
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>> line 38, in <module>
>>>>>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 619, in reduce
>>>>>     vals = self.mapPartitions(func).collect()
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 583, in collect
>>>>>     bytesInJava = self._jrdd.collect().iterator()
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>>>>> line 537, in __call__
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>>>>> line 300, in get_return_value
>>>>> py4j.protocol.Py4JJavaError: An error occurred while calling
>>>>> o24.collect.
>>>>> : org.apache.spark.SparkException: Job aborted due to stage failure:
>>>>> Task 0.0:1 failed 4 times, most recent failure: Exception failure in TID 13
>>>>> on host HDOP-N2.AGT: org.apache.spark.api.python.PythonException: Traceback
>>>>> (most recent call last):
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>>>> line 77, in main
>>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 191, in dump_stream
>>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 123, in dump_stream
>>>>>     for obj in iterator:
>>>>>   File
>>>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>>>> line 180, in _batched
>>>>>     for item in iterator:
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>>>> line 612, in func
>>>>>   File
>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>> line 36, in f
>>>>> SystemError: unknown opcode
>>>>>
>>>>>
>>>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>>>>
>>>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>>>>
>>>>> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>>>>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>>>
>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>>>>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>>>>>
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>>>>
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>>
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>>         java.lang.Thread.run(Thread.java:744)
>>>>> Driver stacktrace:
>>>>> at
>>>>> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>>>> at
>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>>>> at
>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>>>> at
>>>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>>> at
>>>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>>>> at
>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>>> at
>>>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>>>> at scala.Option.foreach(Option.scala:236)
>>>>> at
>>>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>>>> at
>>>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>>> at
>>>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> at
>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>
>>>>> 14/09/03 13:49:35 WARN scheduler.TaskSetManager: Loss was due to
>>>>> org.apache.spark.TaskKilledException
>>>>> org.apache.spark.TaskKilledException
>>>>> at
>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:744)
>>>>>
>>>>>
>>>>> On Wed, Sep 3, 2014 at 1:40 PM, Sandy Ryza <sandy.ryza@cloudera.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi Oleg. To run on YARN, simply set master to "yarn".  The YARN
>>>>>> configuration, located in a yarn-site.xml, determines where to look for the
>>>>>> YARN ResourceManager.
>>>>>>
>>>>>> PROCESS_LOCAL is orthogonal to the choice of cluster resource manager.
>>>>>> A task is considered PROCESS_LOCAL when the executor it's running in happens
>>>>>> to have the data it's processing cached.
>>>>>>
>>>>>> If you're looking to get familiar with the kind of confusing web of
>>>>>> terminology, this blog post might be helpful:
>>>>>>
>>>>>> http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 2, 2014 at 9:51 PM, Oleg Ruchovets <oruchovets@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi ,
>>>>>>>   I change my command to :
>>>>>>>   ./bin/spark-submit --master spark://HDOP-B.AGT:7077 --num-executors
>>>>>>> 3  --driver-memory 4g --executor-memory 2g --executor-cores 1
>>>>>>> examples/src/main/python/pi.py   1000
>>>>>>> and it fixed the problem.
>>>>>>>
>>>>>>> I still have couple of questions:
>>>>>>>    PROCESS_LOCAL is not Yarn execution , right? how should I
>>>>>>> configure the running on yarn? Should I exeture start-all script on all
>>>>>>> machine or only one?  Where is the UI / LOGS of spark execution?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 152 152 SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:14 0.2 s
>>>>>>> 0 0 SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s 39 ms
>>>>>>> 2 2 SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s 39 ms
>>>>>>> 3 3 SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s 39 ms1
>>>>>>> ms
>>>>>>> 4 4 SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.8 s 39 ms
>>>>>>> 2 ms
>>>>>>> 5 5 SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.8 s 39 ms1
>>>>>>> ms
>>>>>>> 6 6 SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.8 s 1 ms
>>>>>>> 7 7 SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s
>>>>>>> 8 8 SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.3 s
>>>>>>> 9 9 SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.4 s
>>>>>>> 10 10 SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.3 s 1 ms
>>>>>>> 11 11 SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.3 s
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Sep 3, 2014 at 12:19 PM, Oleg Ruchovets
>>>>>>> <oruchovets@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi Andrew.
>>>>>>>>    what should I do to set master on yarn, can you please pointing
>>>>>>>> me on command or documentation how to do it?
>>>>>>>>
>>>>>>>>
>>>>>>>> I am doing the following:
>>>>>>>>    executed start-all.sh
>>>>>>>>    [root@HDOP-B sbin]# ./start-all.sh
>>>>>>>> starting org.apache.spark.deploy.master.Master, logging to
>>>>>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-HDOP-B.AGT.out
>>>>>>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list
>>>>>>>> of known hosts.
>>>>>>>> localhost: starting org.apache.spark.deploy.worker.Worker, logging
>>>>>>>> to
>>>>>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-HDOP-B.AGT.out
>>>>>>>>
>>>>>>>>
>>>>>>>> after execute the command:
>>>>>>>>     ./bin/spark-submit --master spark://HDOP-B.AGT:7077
>>>>>>>> examples/src/main/python/pi.py 1000
>>>>>>>>
>>>>>>>>
>>>>>>>> the result is the following:
>>>>>>>>
>>>>>>>>    /usr/jdk64/jdk1.7.0_45/bin/java
>>>>>>>>
>>>>>>>> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>>>>>>> -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
>>>>>>>> 14/09/03 12:10:06 INFO SecurityManager: Using Spark's default log4j
>>>>>>>> profile: org/apache/spark/log4j-defaults.properties
>>>>>>>> 14/09/03 12:10:06 INFO SecurityManager: Changing view acls to: root
>>>>>>>> 14/09/03 12:10:06 INFO SecurityManager: SecurityManager:
>>>>>>>> authentication disabled; ui acls disabled; users with view permissions:
>>>>>>>> Set(root)
>>>>>>>> 14/09/03 12:10:07 INFO Slf4jLogger: Slf4jLogger started
>>>>>>>> 14/09/03 12:10:07 INFO Remoting: Starting remoting
>>>>>>>> 14/09/03 12:10:07 INFO Remoting: Remoting started; listening on
>>>>>>>> addresses :[akka.tcp://spark@HDOP-B.AGT:38944]
>>>>>>>> 14/09/03 12:10:07 INFO Remoting: Remoting now listens on addresses:
>>>>>>>> [akka.tcp://spark@HDOP-B.AGT:38944]
>>>>>>>> 14/09/03 12:10:07 INFO SparkEnv: Registering MapOutputTracker
>>>>>>>> 14/09/03 12:10:07 INFO SparkEnv: Registering BlockManagerMaster
>>>>>>>> 14/09/03 12:10:08 INFO DiskBlockManager: Created local directory at
>>>>>>>> /tmp/spark-local-20140903121008-cf09
>>>>>>>> 14/09/03 12:10:08 INFO MemoryStore: MemoryStore started with
>>>>>>>> capacity 294.9 MB.
>>>>>>>> 14/09/03 12:10:08 INFO ConnectionManager: Bound socket to port 45041
>>>>>>>> with id = ConnectionManagerId(HDOP-B.AGT,45041)
>>>>>>>> 14/09/03 12:10:08 INFO BlockManagerMaster: Trying to register
>>>>>>>> BlockManager
>>>>>>>> 14/09/03 12:10:08 INFO BlockManagerInfo: Registering block manager
>>>>>>>> HDOP-B.AGT:45041 with 294.9 MB RAM
>>>>>>>> 14/09/03 12:10:08 INFO BlockManagerMaster: Registered BlockManager
>>>>>>>> 14/09/03 12:10:08 INFO HttpServer: Starting HTTP Server
>>>>>>>> 14/09/03 12:10:08 INFO HttpBroadcast: Broadcast server started at
>>>>>>>> http://10.193.1.76:59336
>>>>>>>> 14/09/03 12:10:08 INFO HttpFileServer: HTTP File server directory is
>>>>>>>> /tmp/spark-7bf5c3c3-1c02-41e8-9fb0-983e175dd45c
>>>>>>>> 14/09/03 12:10:08 INFO HttpServer: Starting HTTP Server
>>>>>>>> 14/09/03 12:10:08 INFO SparkUI: Started SparkUI at
>>>>>>>> http://HDOP-B.AGT:4040
>>>>>>>> 14/09/03 12:10:09 WARN NativeCodeLoader: Unable to load
>>>>>>>> native-hadoop library for your platform... using builtin-java classes where
>>>>>>>> applicable
>>>>>>>> 14/09/03 12:10:09 INFO Utils: Copying
>>>>>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>>>>>>> to /tmp/spark-4e252376-70cb-4171-bf2c-d804524e816c/pi.py
>>>>>>>> 14/09/03 12:10:09 INFO SparkContext: Added file
>>>>>>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>>>>>>> at http://10.193.1.76:45893/files/pi.py with timestamp 1409717409277
>>>>>>>> 14/09/03 12:10:09 INFO AppClient$ClientActor: Connecting to master
>>>>>>>> spark://HDOP-B.AGT:7077...
>>>>>>>> 14/09/03 12:10:09 INFO SparkDeploySchedulerBackend: Connected to
>>>>>>>> Spark cluster with app ID app-20140903121009-0000
>>>>>>>> 14/09/03 12:10:09 INFO AppClient$ClientActor: Executor added:
>>>>>>>> app-20140903121009-0000/0 on worker-20140903120712-HDOP-B.AGT-51161
>>>>>>>> (HDOP-B.AGT:51161) with 8 cores
>>>>>>>> 14/09/03 12:10:09 INFO SparkDeploySchedulerBackend: Granted executor
>>>>>>>> ID app-20140903121009-0000/0 on hostPort HDOP-B.AGT:51161 with 8 cores,
>>>>>>>> 512.0 MB RAM
>>>>>>>> 14/09/03 12:10:09 INFO AppClient$ClientActor: Executor updated:
>>>>>>>> app-20140903121009-0000/0 is now RUNNING
>>>>>>>> 14/09/03 12:10:12 INFO SparkDeploySchedulerBackend: Registered
>>>>>>>> executor:
>>>>>>>> Actor[akka.tcp://sparkExecutor@HDOP-B.AGT:38143/user/Executor#1295757828]
>>>>>>>> with ID 0
>>>>>>>> 14/09/03 12:10:12 INFO BlockManagerInfo: Registering block manager
>>>>>>>> HDOP-B.AGT:38670 with 294.9 MB RAM
>>>>>>>> Traceback (most recent call last):
>>>>>>>>   File
>>>>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>>>>> line 38, in <module>
>>>>>>>>     count = sc.parallelize(xrange(1, n+1),
>>>>>>>> slices).map(f).reduce(add)
>>>>>>>>   File
>>>>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>>>>>>>> line 271, in parallelize
>>>>>>>>     jrdd = readRDDFromFile(self._jsc, tempFile.name, numSlices)
>>>>>>>>   File
>>>>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>>>>>>>> line 537, in __call__
>>>>>>>>   File
>>>>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>>>>>>>> line 300, in get_return_value
>>>>>>>> py4j.protocol.Py4JJavaError: An error occurred while calling
>>>>>>>> z:org.apache.spark.api.python.PythonRDD.readRDDFromFile.
>>>>>>>> : java.lang.OutOfMemoryError: Java heap space
>>>>>>>> at
>>>>>>>> org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279)
>>>>>>>> at
>>>>>>>> org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala)
>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>> at
>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>>> at
>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>>>>>>>> at
>>>>>>>> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>>>>>>> at py4j.Gateway.invoke(Gateway.java:259)
>>>>>>>> at
>>>>>>>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>>>>>>>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>>>>>>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>>>>>>> at java.lang.Thread.run(Thread.java:744)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> What should I do to fix the issue
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Oleg.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Sep 2, 2014 at 10:32 PM, Andrew Or <andrew@databricks.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Oleg,
>>>>>>>>>
>>>>>>>>> If you are running Spark on a yarn cluster, you should set --master
>>>>>>>>> to yarn. By default this runs in client mode, which redirects all output of
>>>>>>>>> your application to your console. This is failing because it is trying to
>>>>>>>>> connect to a standalone master that you probably did not start. I am
>>>>>>>>> somewhat puzzled as to how you ran into an OOM from this configuration,
>>>>>>>>> however. Does this problem still occur if you set the correct master?
>>>>>>>>>
>>>>>>>>> -Andrew
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-09-02 2:42 GMT-07:00 Oleg Ruchovets <oruchovets@gmail.com>:
>>>>>>>>>
>>>>>>>>>> Hi ,
>>>>>>>>>>    I've installed pyspark on hpd hortonworks cluster.
>>>>>>>>>>   Executing pi example:
>>>>>>>>>>
>>>>>>>>>> command:
>>>>>>>>>>        spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>>>>>>>>>> ./bin/spark-submit --master spark://10.193.1.71:7077
>>>>>>>>>> examples/src/main/python/pi.py   1000
>>>>>>>>>>
>>>>>>>>>> exception:
>>>>>>>>>>
>>>>>>>>>>     14/09/02 17:34:02 INFO SecurityManager: Using Spark's default
>>>>>>>>>> log4j profile: org/apache/spark/log4j-defaults.properties
>>>>>>>>>> 14/09/02 17:34:02 INFO SecurityManager: Changing view acls to:
>>>>>>>>>> root
>>>>>>>>>> 14/09/02 17:34:02 INFO SecurityManager: SecurityManager:
>>>>>>>>>> authentication disabled; ui acls disabled; users with view permissions:
>>>>>>>>>> Set(root)
>>>>>>>>>> 14/09/02 17:34:02 INFO Slf4jLogger: Slf4jLogger started
>>>>>>>>>> 14/09/02 17:34:02 INFO Remoting: Starting remoting
>>>>>>>>>> 14/09/02 17:34:03 INFO Remoting: Remoting started; listening on
>>>>>>>>>> addresses :[akka.tcp://spark@HDOP-M.AGT:41059]
>>>>>>>>>> 14/09/02 17:34:03 INFO Remoting: Remoting now listens on
>>>>>>>>>> addresses: [akka.tcp://spark@HDOP-M.AGT:41059]
>>>>>>>>>> 14/09/02 17:34:03 INFO SparkEnv: Registering MapOutputTracker
>>>>>>>>>> 14/09/02 17:34:03 INFO SparkEnv: Registering BlockManagerMaster
>>>>>>>>>> 14/09/02 17:34:03 INFO DiskBlockManager: Created local directory
>>>>>>>>>> at /tmp/spark-local-20140902173403-cda8
>>>>>>>>>> 14/09/02 17:34:03 INFO MemoryStore: MemoryStore started with
>>>>>>>>>> capacity 294.9 MB.
>>>>>>>>>> 14/09/02 17:34:03 INFO ConnectionManager: Bound socket to port
>>>>>>>>>> 34931 with id = ConnectionManagerId(HDOP-M.AGT,34931)
>>>>>>>>>> 14/09/02 17:34:03 INFO BlockManagerMaster: Trying to register
>>>>>>>>>> BlockManager
>>>>>>>>>> 14/09/02 17:34:03 INFO BlockManagerInfo: Registering block manager
>>>>>>>>>> HDOP-M.AGT:34931 with 294.9 MB RAM
>>>>>>>>>> 14/09/02 17:34:03 INFO BlockManagerMaster: Registered BlockManager
>>>>>>>>>> 14/09/02 17:34:03 INFO HttpServer: Starting HTTP Server
>>>>>>>>>> 14/09/02 17:34:03 INFO HttpBroadcast: Broadcast server started at
>>>>>>>>>> http://10.193.1.71:54341
>>>>>>>>>> 14/09/02 17:34:03 INFO HttpFileServer: HTTP File server directory
>>>>>>>>>> is /tmp/spark-77c7a7dc-181e-4069-a014-8103a6a6330a
>>>>>>>>>> 14/09/02 17:34:03 INFO HttpServer: Starting HTTP Server
>>>>>>>>>> 14/09/02 17:34:04 INFO SparkUI: Started SparkUI at
>>>>>>>>>> http://HDOP-M.AGT:4040
>>>>>>>>>> 14/09/02 17:34:04 WARN NativeCodeLoader: Unable to load
>>>>>>>>>> native-hadoop library for your platform... using builtin-java classes where
>>>>>>>>>> applicable
>>>>>>>>>> 14/09/02 17:34:04 INFO Utils: Copying
>>>>>>>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>>>>>>>>> to /tmp/spark-f2e0cc0f-59cb-4f6c-9d48-f16205a40c7e/pi.py
>>>>>>>>>> 14/09/02 17:34:04 INFO SparkContext: Added file
>>>>>>>>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>>>>>>>>> at http://10.193.1.71:52938/files/pi.py with timestamp 1409650444941
>>>>>>>>>> 14/09/02 17:34:05 INFO AppClient$ClientActor: Connecting to master
>>>>>>>>>> spark://10.193.1.71:7077...
>>>>>>>>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to
>>>>>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>>>>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>>>>>>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to
>>>>>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>>>>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>>>>>>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to
>>>>>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>>>>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>>>>>>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to
>>>>>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>>>>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>>>>>>>> 14/09/02 17:34:25 INFO AppClient$ClientActor: Connecting to master
>>>>>>>>>> spark://10.193.1.71:7077...
>>>>>>>>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to
>>>>>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>>>>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>>>>>>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to
>>>>>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>>>>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>>>>>>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to
>>>>>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>>>>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>>>>>>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to
>>>>>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>>>>>>>> akka.remote.EndpointAssociationException: Association failed with
>>>>>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>   File
>>>>>>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>>>>>>>> line 38, in <module>
>>>>>>>>>>     count = sc.parallelize(xrange(1, n+1),
>>>>>>>>>> slices).map(f).reduce(add)
>>>>>>>>>>   File
>>>>>>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>>>>>>>>>> line 271, in parallelize
>>>>>>>>>>     jrdd = readRDDFromFile(self._jsc, tempFile.name, numSlices)
>>>>>>>>>>   File
>>>>>>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>>>>>>>>>> line 537, in __call__
>>>>>>>>>>   File
>>>>>>>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>>>>>>>>>> line 300, in get_return_value
>>>>>>>>>> py4j.protocol.Py4JJavaError: An error occurred while calling
>>>>>>>>>> z:org.apache.spark.api.python.PythonRDD.readRDDFromFile.
>>>>>>>>>> : java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>>>>>>>> at
>>>>>>>>>> org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279)
>>>>>>>>>> at
>>>>>>>>>> org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala)
>>>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>>> at
>>>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>>>>>>> at
>>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>>>>>>>>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>>>>>>>>>> at
>>>>>>>>>> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>>>>>>>>> at py4j.Gateway.invoke(Gateway.java:259)
>>>>>>>>>> at
>>>>>>>>>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>>>>>>>>>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>>>>>>>>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>>>>>>>>> at java.lang.Thread.run(Thread.java:744)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Question:
>>>>>>>>>>     how can I know spark master and port? Where is it defined?
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>> Oleg.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message