spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Ruchovets <oruchov...@gmail.com>
Subject Re: pyspark yarn got exception
Date Fri, 05 Sep 2014 04:50:37 GMT
export SPARK_YARN_USER_ENV=PYSPARK_PYTHON=/anaconda/bin/*pyspark*

The path /anaconda/bin/*pyspark* means that pyspark should be copied to
anaconda distribution?





On Fri, Sep 5, 2014 at 2:21 AM, Andrew Or <andrew@databricks.com> wrote:

> You need to set both; one is for the driver (first line) and the other is
> for the executors (second line):
>
> export PYSPARK_PYTHON=/anaconda/bin/python
> export SPARK_YARN_USER_ENV=PYSPARK_PYTHON=/anaconda/bin/pyspark
>
> These should both go into conf/spark-env.sh.
>
>
> 2014-09-04 11:19 GMT-07:00 Oleg Ruchovets <oruchovets@gmail.com>:
>
>> Where should I do the changes:
>>   You can use PYSPARK_PYTHON to choose which version of python will be
>>      used in pyspark, such as:
>> PYSPARK_PYTHON=/anaconda/bin/python  bin/pyspark
>>
>> And should I do it with :
>> export SPARK_YARN_USER_ENV=PYSPARK_PYTHON=/anaconda/bin/pyspark
>>
>>
>> Or one of them.
>>
>> Thanks
>> Oleg.
>>
>>
>> On Fri, Sep 5, 2014 at 12:52 AM, Davies Liu <davies@databricks.com>
>> wrote:
>>
>>> You can use PYSPARK_PYTHON to choose which version of python will be
>>> used in pyspark, such as:
>>>
>>> PYSPARK_PYTHON=/anaconda/bin/python  bin/pyspark
>>>
>>> On Thu, Sep 4, 2014 at 1:30 AM, Oleg Ruchovets <oruchovets@gmail.com>
>>> wrote:
>>> > Hi ,
>>> >
>>> >     I got what is the reason of the problem.
>>> > HDP Hortonworks uses python 2.6.6 for ambari installations and rest of
>>> the
>>> > stuff.
>>> > I can run the PySpark and it works fine , but I need to use Anaconda
>>> > distribution (for spark). When I installed Anaconda (python 2.7.7) i
>>> GOT THE
>>> > PROBLEM.
>>> >
>>> > Question: how can this be resolved? Is there an way to have 2 python
>>> > versions installed on one machine?
>>> >
>>> >
>>> > Thanks
>>> > Oleg.
>>> >
>>> >
>>> > On Thu, Sep 4, 2014 at 1:15 PM, Oleg Ruchovets <oruchovets@gmail.com>
>>> wrote:
>>> >>
>>> >> Hi Andrew.
>>> >>
>>> >> Problem still occur:
>>> >>
>>> >> all machines are using python 2.7:
>>> >>
>>> >> [root@HDOP-N2 conf]# python --version
>>> >> Python 2.7.7 :: Anaconda 2.0.1 (64-bit)
>>> >>
>>> >> Executing command from bin/pyspark:
>>> >>            [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563
>>> ]#
>>> >> bin/pyspark    --driver-memory 4g --executor-memory 2g
>>> --executor-cores 1
>>> >> examples/src/main/python/pi.py   1000
>>> >>
>>> >>
>>> >> Python 2.7.7 |Anaconda 2.0.1 (64-bit)| (default, Jun  2 2014,
>>> 12:34:02)
>>> >> [GCC 4.1.2 20080704 (Red Hat 4.1.2-54)] on linux2
>>> >> Type "help", "copyright", "credits" or "license" for more information.
>>> >> Anaconda is brought to you by Continuum Analytics.
>>> >> Please check out: http://continuum.io/thanks and https://binstar.org
>>> >> Traceback (most recent call last):
>>> >>   File
>>> >> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563
>>> /python/pyspark/shell.py",
>>> >> line 43, in <module>
>>> >>     sc = SparkContext(appName="PySparkShell", pyFiles=add_files)
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>>> >> line 94, in __init__
>>> >>     SparkContext._ensure_initialized(self, gateway=gateway)
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>>> >> line 190, in _ensure_initialized
>>> >>     SparkContext._gateway = gateway or launch_gateway()
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/java_gateway.py",
>>> >> line 51, in launch_gateway
>>> >>     gateway_port = int(proc.stdout.readline())
>>> >> ValueError: invalid literal for int() with base 10:
>>> >> '/usr/jdk64/jdk1.7.0_45/bin/java\n'
>>> >> >>>
>>> >>
>>> >>
>>> >>
>>> >> This log is from Yarn Spark execution:
>>> >>
>>> >>
>>> >> SLF4J: Class path contains multiple SLF4J bindings.
>>> >> SLF4J: Found binding in
>>> >>
>>> [jar:file:/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> >> SLF4J: Found binding in
>>> >>
>>> [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>> >> explanation.
>>> >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>> >> 14/09/04 12:53:19 INFO SecurityManager: Changing view acls to:
>>> yarn,root
>>> >> 14/09/04 12:53:19 INFO SecurityManager: SecurityManager:
>>> authentication
>>> >> disabled; ui acls disabled; users with view permissions: Set(yarn,
>>> root)
>>> >> 14/09/04 12:53:20 INFO Slf4jLogger: Slf4jLogger started
>>> >> 14/09/04 12:53:20 INFO Remoting: Starting remoting
>>> >> 14/09/04 12:53:20 INFO Remoting: Remoting started; listening on
>>> addresses
>>> >> :[akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619]
>>> >> 14/09/04 12:53:20 INFO Remoting: Remoting now listens on addresses:
>>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619]
>>> >> 14/09/04 12:53:20 INFO RMProxy: Connecting to ResourceManager at
>>> >> HDOP-N1.AGT/10.193.1.72:8030
>>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: ApplicationAttemptId:
>>> >> appattempt_1409805761292_0005_000001
>>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: Registering the
>>> ApplicationMaster
>>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: Waiting for Spark driver to
>>> be
>>> >> reachable.
>>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: Driver now available:
>>> >> HDOP-B.AGT:45747
>>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: Listen to driver:
>>> >> akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler
>>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: Allocating 3 executors.
>>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Will Allocate 3 executor
>>> >> containers, each with 2432 memory
>>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Container request (host:
>>> >> Any, priority: 1, capability: <memory:2432, vCores:1>
>>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Container request (host:
>>> >> Any, priority: 1, capability: <memory:2432, vCores:1>
>>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Container request (host:
>>> >> Any, priority: 1, capability: <memory:2432, vCores:1>
>>> >> 14/09/04 12:53:21 INFO AMRMClientImpl: Received new token for :
>>> >> HDOP-M.AGT:45454
>>> >> 14/09/04 12:53:21 INFO AMRMClientImpl: Received new token for :
>>> >> HDOP-N1.AGT:45454
>>> >> 14/09/04 12:53:21 INFO RackResolver: Resolved HDOP-N1.AGT to
>>> /default-rack
>>> >> 14/09/04 12:53:21 INFO RackResolver: Resolved HDOP-M.AGT to
>>> /default-rack
>>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Launching container
>>> >> container_1409805761292_0005_01_000003 for on host HDOP-N1.AGT
>>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Launching
>>> ExecutorRunnable.
>>> >> driverUrl: akka.tcp://spark@HDOP-B.AGT
>>> :45747/user/CoarseGrainedScheduler,
>>> >> executorHostname: HDOP-N1.AGT
>>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Launching container
>>> >> container_1409805761292_0005_01_000002 for on host HDOP-M.AGT
>>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Starting Executor Container
>>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Launching
>>> ExecutorRunnable.
>>> >> driverUrl: akka.tcp://spark@HDOP-B.AGT
>>> :45747/user/CoarseGrainedScheduler,
>>> >> executorHostname: HDOP-M.AGT
>>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Starting Executor Container
>>> >> 14/09/04 12:53:21 INFO ContainerManagementProtocolProxy:
>>> >> yarn.client.max-nodemanagers-proxies : 500
>>> >> 14/09/04 12:53:21 INFO ContainerManagementProtocolProxy:
>>> >> yarn.client.max-nodemanagers-proxies : 500
>>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Setting up
>>> ContainerLaunchContext
>>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Setting up
>>> ContainerLaunchContext
>>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Preparing Local resources
>>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Preparing Local resources
>>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Prepared Local resources
>>> >> Map(pi.py -> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020
>>> file:
>>> >> "/user/root/.sparkStaging/application_1409805761292_0005/pi.py" }
>>> size: 1317
>>> >> timestamp: 1409806397200 type: FILE visibility: PRIVATE,
>>> __spark__.jar ->
>>> >> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020 file:
>>> >>
>>> "/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar"
>>> >> } size: 121759562 timestamp: 1409806397057 type: FILE visibility:
>>> PRIVATE)
>>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Prepared Local resources
>>> >> Map(pi.py -> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020
>>> file:
>>> >> "/user/root/.sparkStaging/application_1409805761292_0005/pi.py" }
>>> size: 1317
>>> >> timestamp: 1409806397200 type: FILE visibility: PRIVATE,
>>> __spark__.jar ->
>>> >> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020 file:
>>> >>
>>> "/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar"
>>> >> } size: 121759562 timestamp: 1409806397057 type: FILE visibility:
>>> PRIVATE)
>>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Setting up executor with
>>> >> commands: List($JAVA_HOME/bin/java, -server,
>>> -XX:OnOutOfMemoryError='kill
>>> >> %p', -Xms2048m -Xmx2048m , -Djava.io.tmpdir=$PWD/tmp,
>>> >> -Dlog4j.configuration=log4j-spark-container.properties,
>>> >> org.apache.spark.executor.CoarseGrainedExecutorBackend,
>>> >> akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler, 1,
>>> >> HDOP-N1.AGT, 1, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
>>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Setting up executor with
>>> >> commands: List($JAVA_HOME/bin/java, -server,
>>> -XX:OnOutOfMemoryError='kill
>>> >> %p', -Xms2048m -Xmx2048m , -Djava.io.tmpdir=$PWD/tmp,
>>> >> -Dlog4j.configuration=log4j-spark-container.properties,
>>> >> org.apache.spark.executor.CoarseGrainedExecutorBackend,
>>> >> akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler, 2,
>>> >> HDOP-M.AGT, 1, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
>>> >> 14/09/04 12:53:22 INFO ContainerManagementProtocolProxy: Opening
>>> proxy :
>>> >> HDOP-N1.AGT:45454
>>> >> 14/09/04 12:53:22 INFO ContainerManagementProtocolProxy: Opening
>>> proxy :
>>> >> HDOP-M.AGT:45454
>>> >> 14/09/04 12:53:22 INFO AMRMClientImpl: Received new token for :
>>> >> HDOP-N4.AGT:45454
>>> >> 14/09/04 12:53:22 INFO RackResolver: Resolved HDOP-N4.AGT to
>>> /default-rack
>>> >> 14/09/04 12:53:22 INFO YarnAllocationHandler: Launching container
>>> >> container_1409805761292_0005_01_000004 for on host HDOP-N4.AGT
>>> >> 14/09/04 12:53:22 INFO YarnAllocationHandler: Launching
>>> ExecutorRunnable.
>>> >> driverUrl: akka.tcp://spark@HDOP-B.AGT
>>> :45747/user/CoarseGrainedScheduler,
>>> >> executorHostname: HDOP-N4.AGT
>>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Starting Executor Container
>>> >> 14/09/04 12:53:22 INFO ContainerManagementProtocolProxy:
>>> >> yarn.client.max-nodemanagers-proxies : 500
>>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Setting up
>>> ContainerLaunchContext
>>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Preparing Local resources
>>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Prepared Local resources
>>> >> Map(pi.py -> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020
>>> file:
>>> >> "/user/root/.sparkStaging/application_1409805761292_0005/pi.py" }
>>> size: 1317
>>> >> timestamp: 1409806397200 type: FILE visibility: PRIVATE,
>>> __spark__.jar ->
>>> >> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020 file:
>>> >>
>>> "/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar"
>>> >> } size: 121759562 timestamp: 1409806397057 type: FILE visibility:
>>> PRIVATE)
>>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Setting up executor with
>>> >> commands: List($JAVA_HOME/bin/java, -server,
>>> -XX:OnOutOfMemoryError='kill
>>> >> %p', -Xms2048m -Xmx2048m , -Djava.io.tmpdir=$PWD/tmp,
>>> >> -Dlog4j.configuration=log4j-spark-container.properties,
>>> >> org.apache.spark.executor.CoarseGrainedExecutorBackend,
>>> >> akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler, 3,
>>> >> HDOP-N4.AGT, 1, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
>>> >> 14/09/04 12:53:22 INFO ContainerManagementProtocolProxy: Opening
>>> proxy :
>>> >> HDOP-N4.AGT:45454
>>> >> 14/09/04 12:53:22 INFO ExecutorLauncher: All executors have launched.
>>> >> 14/09/04 12:53:22 INFO ExecutorLauncher: Started progress reporter
>>> thread
>>> >> - sleep time : 5000
>>> >> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>>> >> disconnected! Shutting down. Disassociated
>>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>>> >> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>>> >> disconnected! Shutting down. Disassociated
>>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>>> >> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>>> >> disconnected! Shutting down. Disassociated
>>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>>> >> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>>> >> disconnected! Shutting down. Disassociated
>>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>>> >> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>>> >> disconnected! Shutting down. Disassociated
>>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>>> >> 14/09/04 12:54:02 INFO ExecutorLauncher: finish ApplicationMaster with
>>> >> SUCCEEDED
>>> >> 14/09/04 12:54:02 INFO AMRMClientImpl: Waiting for application to be
>>> >> successfully unregistered.
>>> >> 14/09/04 12:54:02 INFO ExecutorLauncher: Exited
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> Exception still occur:
>>> >>
>>> >>
>>> >>
>>> >>   [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>>> >> ./bin/spark-submit --master yarn  --num-executors 3  --driver-memory
>>> 4g
>>> >> --executor-memory 2g --executor-cores 1
>>>  examples/src/main/python/pi.py
>>> >> 1000
>>> >> /usr/jdk64/jdk1.7.0_45/bin/java
>>> >>
>>> >>
>>> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar:/etc/hadoop/conf
>>> >> -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
>>> >> 14/09/04 12:53:11 INFO spark.SecurityManager: Changing view acls to:
>>> root
>>> >> 14/09/04 12:53:11 INFO spark.SecurityManager: SecurityManager:
>>> >> authentication disabled; ui acls disabled; users with view
>>> permissions:
>>> >> Set(root)
>>> >> 14/09/04 12:53:12 INFO slf4j.Slf4jLogger: Slf4jLogger started
>>> >> 14/09/04 12:53:12 INFO Remoting: Starting remoting
>>> >> 14/09/04 12:53:12 INFO Remoting: Remoting started; listening on
>>> addresses
>>> >> :[akka.tcp://spark@HDOP-B.AGT:45747]
>>> >> 14/09/04 12:53:12 INFO Remoting: Remoting now listens on addresses:
>>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>>> >> 14/09/04 12:53:12 INFO spark.SparkEnv: Registering MapOutputTracker
>>> >> 14/09/04 12:53:12 INFO spark.SparkEnv: Registering BlockManagerMaster
>>> >> 14/09/04 12:53:12 INFO storage.DiskBlockManager: Created local
>>> directory
>>> >> at /tmp/spark-local-20140904125312-c7ea
>>> >> 14/09/04 12:53:12 INFO storage.MemoryStore: MemoryStore started with
>>> >> capacity 2.3 GB.
>>> >> 14/09/04 12:53:12 INFO network.ConnectionManager: Bound socket to port
>>> >> 37363 with id = ConnectionManagerId(HDOP-B.AGT,37363)
>>> >> 14/09/04 12:53:12 INFO storage.BlockManagerMaster: Trying to register
>>> >> BlockManager
>>> >> 14/09/04 12:53:12 INFO storage.BlockManagerInfo: Registering block
>>> manager
>>> >> HDOP-B.AGT:37363 with 2.3 GB RAM
>>> >> 14/09/04 12:53:12 INFO storage.BlockManagerMaster: Registered
>>> BlockManager
>>> >> 14/09/04 12:53:12 INFO spark.HttpServer: Starting HTTP Server
>>> >> 14/09/04 12:53:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >> 14/09/04 12:53:12 INFO server.AbstractConnector: Started
>>> >> SocketConnector@0.0.0.0:33547
>>> >> 14/09/04 12:53:12 INFO broadcast.HttpBroadcast: Broadcast server
>>> started
>>> >> at http://10.193.1.76:33547
>>> >> 14/09/04 12:53:12 INFO spark.HttpFileServer: HTTP File server
>>> directory is
>>> >> /tmp/spark-054f4eda-b93b-47d3-87d5-c40e81fc1fe8
>>> >> 14/09/04 12:53:12 INFO spark.HttpServer: Starting HTTP Server
>>> >> 14/09/04 12:53:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >> 14/09/04 12:53:12 INFO server.AbstractConnector: Started
>>> >> SocketConnector@0.0.0.0:54594
>>> >> 14/09/04 12:53:13 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >> 14/09/04 12:53:13 INFO server.AbstractConnector: Started
>>> >> SelectChannelConnector@0.0.0.0:4040
>>> >> 14/09/04 12:53:13 INFO ui.SparkUI: Started SparkUI at
>>> >> http://HDOP-B.AGT:4040
>>> >> 14/09/04 12:53:13 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop
>>> >> library for your platform... using builtin-java classes where
>>> applicable
>>> >> --args is deprecated. Use --arg instead.
>>> >> 14/09/04 12:53:14 INFO client.RMProxy: Connecting to ResourceManager
>>> at
>>> >> HDOP-N1.AGT/10.193.1.72:8050
>>> >> 14/09/04 12:53:14 INFO yarn.Client: Got Cluster metric info from
>>> >> ApplicationsManager (ASM), number of NodeManagers: 6
>>> >> 14/09/04 12:53:14 INFO yarn.Client: Queue info ... queueName: default,
>>> >> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>>> >>       queueApplicationCount = 0, queueChildQueueCount = 0
>>> >> 14/09/04 12:53:14 INFO yarn.Client: Max mem capabililty of a single
>>> >> resource in this cluster 13824
>>> >> 14/09/04 12:53:14 INFO yarn.Client: Preparing Local resources
>>> >> 14/09/04 12:53:15 INFO yarn.Client: Uploading
>>> >>
>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>> >> to
>>> >>
>>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>> >> 14/09/04 12:53:17 INFO yarn.Client: Uploading
>>> >>
>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>> >> to
>>> >>
>>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409805761292_0005/pi.py
>>> >> 14/09/04 12:53:17 INFO yarn.Client: Setting up the launch environment
>>> >> 14/09/04 12:53:17 INFO yarn.Client: Setting up container launch
>>> context
>>> >> 14/09/04 12:53:17 INFO yarn.Client: Command for starting the Spark
>>> >> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
>>> >> -Djava.io.tmpdir=$PWD/tmp,
>>> >>
>>> -Dspark.tachyonStore.folderName=\"spark-2b59c845-3de2-4c3d-a352-1379ecade281\",
>>> >> -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
>>> >>
>>> -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
>>> >> -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
>>> >> -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\",
>>> >> -Dspark.fileserver.uri=\"http://10.193.1.76:54594\",
>>> >> -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"45747\",
>>> >> -Dspark.executor.cores=\"1\",
>>> >> -Dspark.httpBroadcast.uri=\"http://10.193.1.76:33547\",
>>> >> -Dlog4j.configuration=log4j-spark-container.properties,
>>> >> org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused,
>>> --jar ,
>>> >> null,  --args  'HDOP-B.AGT:45747' , --executor-memory, 2048,
>>> >> --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
>>> >> <LOG_DIR>/stderr)
>>> >> 14/09/04 12:53:17 INFO yarn.Client: Submitting application to ASM
>>> >> 14/09/04 12:53:17 INFO impl.YarnClientImpl: Submitted application
>>> >> application_1409805761292_0005
>>> >> 14/09/04 12:53:17 INFO cluster.YarnClientSchedulerBackend: Application
>>> >> report from ASM:
>>> >> appMasterRpcPort: -1
>>> >> appStartTime: 1409806397305
>>> >> yarnAppState: ACCEPTED
>>> >>
>>> >> 14/09/04 12:53:18 INFO cluster.YarnClientSchedulerBackend: Application
>>> >> report from ASM:
>>> >> appMasterRpcPort: -1
>>> >> appStartTime: 1409806397305
>>> >> yarnAppState: ACCEPTED
>>> >>
>>> >> 14/09/04 12:53:19 INFO cluster.YarnClientSchedulerBackend: Application
>>> >> report from ASM:
>>> >> appMasterRpcPort: -1
>>> >> appStartTime: 1409806397305
>>> >> yarnAppState: ACCEPTED
>>> >>
>>> >> 14/09/04 12:53:20 INFO cluster.YarnClientSchedulerBackend: Application
>>> >> report from ASM:
>>> >> appMasterRpcPort: -1
>>> >> appStartTime: 1409806397305
>>> >> yarnAppState: ACCEPTED
>>> >>
>>> >> 14/09/04 12:53:21 INFO cluster.YarnClientSchedulerBackend: Application
>>> >> report from ASM:
>>> >> appMasterRpcPort: 0
>>> >> appStartTime: 1409806397305
>>> >> yarnAppState: RUNNING
>>> >>
>>> >> 14/09/04 12:53:23 INFO cluster.YarnClientClusterScheduler:
>>> >> YarnClientClusterScheduler.postStartHook done
>>> >> 14/09/04 12:53:26 INFO cluster.YarnClientSchedulerBackend: Registered
>>> >> executor:
>>> >> Actor[akka.tcp://sparkExecutor@HDOP-N1.AGT:40024/user/Executor#
>>> 2065794895]
>>> >> with ID 1
>>> >> 14/09/04 12:53:26 INFO storage.BlockManagerInfo: Registering block
>>> manager
>>> >> HDOP-N1.AGT:34857 with 1178.1 MB RAM
>>> >> 14/09/04 12:53:26 INFO cluster.YarnClientSchedulerBackend: Registered
>>> >> executor:
>>> >> Actor[akka.tcp://sparkExecutor@HDOP-N4.AGT
>>> :49234/user/Executor#820272849]
>>> >> with ID 3
>>> >> 14/09/04 12:53:27 INFO cluster.YarnClientSchedulerBackend: Registered
>>> >> executor:
>>> >> Actor[akka.tcp://sparkExecutor@HDOP-M.AGT
>>> :38124/user/Executor#715249825]
>>> >> with ID 2
>>> >> 14/09/04 12:53:27 INFO storage.BlockManagerInfo: Registering block
>>> manager
>>> >> HDOP-N4.AGT:43365 with 1178.1 MB RAM
>>> >> 14/09/04 12:53:27 INFO storage.BlockManagerInfo: Registering block
>>> manager
>>> >> HDOP-M.AGT:45711 with 1178.1 MB RAM
>>> >> 14/09/04 12:53:55 INFO spark.SparkContext: Starting job: reduce at
>>> >>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Got job 0 (reduce at
>>> >>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>>> >> with 1000 output partitions (allowLocal=false)
>>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Final stage: Stage
>>> 0(reduce
>>> >> at
>>> >>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Parents of final stage:
>>> >> List()
>>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Missing parents: List()
>>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Submitting Stage 0
>>> >> (PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing
>>> parents
>>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Submitting 1000 missing
>>> >> tasks from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
>>> >> 14/09/04 12:53:55 INFO cluster.YarnClientClusterScheduler: Adding
>>> task set
>>> >> 0.0 with 1000 tasks
>>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:0
>>> as
>>> >> TID 0 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0 as
>>> >> 369810 bytes in 5 ms
>>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:1
>>> as
>>> >> TID 1 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1 as
>>> >> 506275 bytes in 2 ms
>>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:2
>>> as
>>> >> TID 2 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2 as
>>> >> 501135 bytes in 2 ms
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3
>>> as
>>> >> TID 3 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3 as
>>> >> 506275 bytes in 5 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 1 (task
>>> 0.0:1)
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>
>>> >> at
>>> >>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>> >> at
>>> >>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>> >> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>> >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>> >> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>> >> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>> >> at
>>> >>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >> at
>>> >>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >> at java.lang.Thread.run(Thread.java:744)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1
>>> as
>>> >> TID 4 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1 as
>>> >> 506275 bytes in 5 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 2 (task
>>> 0.0:2)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 1]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2
>>> as
>>> >> TID 5 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2 as
>>> >> 501135 bytes in 5 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 3 (task
>>> 0.0:3)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 2]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3
>>> as
>>> >> TID 6 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3 as
>>> >> 506275 bytes in 5 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 0 (task
>>> 0.0:0)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 3]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:0
>>> as
>>> >> TID 7 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0 as
>>> >> 369810 bytes in 4 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 5 (task
>>> 0.0:2)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 4]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2
>>> as
>>> >> TID 8 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2 as
>>> >> 501135 bytes in 3 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 4 (task
>>> 0.0:1)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 5]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1
>>> as
>>> >> TID 9 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1 as
>>> >> 506275 bytes in 4 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 6 (task
>>> 0.0:3)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 6]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3
>>> as
>>> >> TID 10 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3 as
>>> >> 506275 bytes in 3 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 7 (task
>>> 0.0:0)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 7]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:0
>>> as
>>> >> TID 11 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0 as
>>> >> 369810 bytes in 3 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 8 (task
>>> 0.0:2)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 8]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2
>>> as
>>> >> TID 12 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2 as
>>> >> 501135 bytes in 4 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 10 (task
>>> 0.0:3)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 9]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3
>>> as
>>> >> TID 13 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3 as
>>> >> 506275 bytes in 3 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 9 (task
>>> 0.0:1)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 10]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1
>>> as
>>> >> TID 14 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1 as
>>> >> 506275 bytes in 4 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 11 (task
>>> 0.0:0)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 11]
>>> >> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Starting task 0.0:0
>>> as
>>> >> TID 15 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0 as
>>> >> 369810 bytes in 4 ms
>>> >> 14/09/04 12:53:57 WARN scheduler.TaskSetManager: Lost TID 12 (task
>>> 0.0:2)
>>> >> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 12]
>>> >> 14/09/04 12:53:57 ERROR scheduler.TaskSetManager: Task 0.0:2 failed 4
>>> >> times; aborting job
>>> >> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 13]
>>> >> 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Cancelling
>>> >> stage 0
>>> >> 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Stage 0 was
>>> >> cancelled
>>> >> 14/09/04 12:53:57 INFO scheduler.DAGScheduler: Failed to run reduce at
>>> >>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>>> >> Traceback (most recent call last):
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 38, in <module>
>>> >>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 619, in reduce
>>> >>     vals = self.mapPartitions(func).collect()
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 583, in collect
>>> >>     bytesInJava = self._jrdd.collect().iterator()
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>>> >> line 537, in __call__
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>>> >> line 300, in get_return_value
>>> >> py4j.protocol.Py4JJavaError14/09/04 12:53:57 INFO
>>> >> scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 14]
>>> >> 14/09/04 12:53:57 WARN scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.TaskKilledException
>>> >> org.apache.spark.TaskKilledException
>>> >> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>>> >> at
>>> >>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >> at
>>> >>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >> at java.lang.Thread.run(Thread.java:744)
>>> >> : An error occurred while calling o24.collect.
>>> >> : org.apache.spark.SparkException: Job aborted due to stage failure:
>>> Task
>>> >> 0.0:2 failed 4 times, most recent failure: Exception failure in TID
>>> 12 on
>>> >> host HDOP-M.AGT: org.apache.spark.api.python.PythonException:
>>> Traceback
>>> >> (most recent call last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>
>>> >>
>>> >>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>> >>
>>> >>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>> >>
>>>  org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>> >>
>>>  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> >>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> >>
>>> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>> >>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>>> >>
>>> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>> >>
>>> >>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>
>>> >>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>         java.lang.Thread.run(Thread.java:744)
>>> >> Driver stacktrace:
>>> >> at
>>> >> org.apache.spark.scheduler.DAGScheduler.org
>>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>> >> at
>>> >>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>> >> at
>>> >>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>> >> at
>>> >>
>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>> >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>> >> at
>>> >>
>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>> >> at
>>> >>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>> >> at
>>> >>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>> >> at scala.Option.foreach(Option.scala:236)
>>> >> at
>>> >>
>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>> >> at
>>> >>
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>> >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>> >> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>> >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>> >> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>> >> at
>>> >>
>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>> >> at
>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> >> at
>>> >>
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> >> at
>>> >>
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> >> at
>>> >>
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> >>
>>> >> 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Removed
>>> TaskSet
>>> >> 0.0, whose tasks have all completed, from pool
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> What other procedure can be done for fixing the problem.
>>> >>
>>> >>
>>> >> Thanks
>>> >>
>>> >> Oleg.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Sep 4, 2014 at 5:36 AM, Andrew Or <andrew@databricks.com>
>>> wrote:
>>> >>>
>>> >>> Hi Oleg,
>>> >>>
>>> >>> Your configuration looks alright to me. I haven't seen an "unknown
>>> >>> opcode" System.error before in PySpark. This usually means you have
>>> >>> corrupted .pyc files lying around (ones that belonged to an old
>>> python
>>> >>> version, perhaps). What python version are you using? Are all your
>>> nodes
>>> >>> running the same version of python? What happens if you just run
>>> bin/pyspark
>>> >>> with the same command line arguments, and then do an
>>> >>> "sc.parallelize(range(10)).count()", does it still fail?
>>> >>>
>>> >>> Andrew
>>> >>>
>>> >>>
>>> >>> 2014-09-02 23:42 GMT-07:00 Oleg Ruchovets <oruchovets@gmail.com>:
>>> >>>>
>>> >>>> Hi I changed master to yarn but execution failed with exception
>>> again. I
>>> >>>> am using PySpark.
>>> >>>>
>>> >>>> [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>>> >>>> ./bin/spark-submit --master yarn  --num-executors 3
>>> --driver-memory 4g
>>> >>>> --executor-memory 2g --executor-cores 1
>>>  examples/src/main/python/pi.py
>>> >>>> 1000
>>> >>>> /usr/jdk64/jdk1.7.0_45/bin/java
>>> >>>>
>>> >>>>
>>> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar:/etc/hadoop/conf
>>> >>>> -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
>>> >>>> 14/09/03 14:35:11 INFO spark.SecurityManager: Changing view acls to:
>>> >>>> root
>>> >>>> 14/09/03 14:35:11 INFO spark.SecurityManager: SecurityManager:
>>> >>>> authentication disabled; ui acls disabled; users with view
>>> permissions:
>>> >>>> Set(root)
>>> >>>> 14/09/03 14:35:11 INFO slf4j.Slf4jLogger: Slf4jLogger started
>>> >>>> 14/09/03 14:35:11 INFO Remoting: Starting remoting
>>> >>>> 14/09/03 14:35:12 INFO Remoting: Remoting started; listening on
>>> >>>> addresses :[akka.tcp://spark@HDOP-B.AGT:51707]
>>> >>>> 14/09/03 14:35:12 INFO Remoting: Remoting now listens on addresses:
>>> >>>> [akka.tcp://spark@HDOP-B.AGT:51707]
>>> >>>> 14/09/03 14:35:12 INFO spark.SparkEnv: Registering MapOutputTracker
>>> >>>> 14/09/03 14:35:12 INFO spark.SparkEnv: Registering
>>> BlockManagerMaster
>>> >>>> 14/09/03 14:35:12 INFO storage.DiskBlockManager: Created local
>>> directory
>>> >>>> at /tmp/spark-local-20140903143512-5aab
>>> >>>> 14/09/03 14:35:12 INFO storage.MemoryStore: MemoryStore started with
>>> >>>> capacity 2.3 GB.
>>> >>>> 14/09/03 14:35:12 INFO network.ConnectionManager: Bound socket to
>>> port
>>> >>>> 53216 with id = ConnectionManagerId(HDOP-B.AGT,53216)
>>> >>>> 14/09/03 14:35:12 INFO storage.BlockManagerMaster: Trying to
>>> register
>>> >>>> BlockManager
>>> >>>> 14/09/03 14:35:12 INFO storage.BlockManagerInfo: Registering block
>>> >>>> manager HDOP-B.AGT:53216 with 2.3 GB RAM
>>> >>>> 14/09/03 14:35:12 INFO storage.BlockManagerMaster: Registered
>>> >>>> BlockManager
>>> >>>> 14/09/03 14:35:12 INFO spark.HttpServer: Starting HTTP Server
>>> >>>> 14/09/03 14:35:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >>>> 14/09/03 14:35:12 INFO server.AbstractConnector: Started
>>> >>>> SocketConnector@0.0.0.0:50624
>>> >>>> 14/09/03 14:35:12 INFO broadcast.HttpBroadcast: Broadcast server
>>> started
>>> >>>> at http://10.193.1.76:50624
>>> >>>> 14/09/03 14:35:12 INFO spark.HttpFileServer: HTTP File server
>>> directory
>>> >>>> is /tmp/spark-fd7fdcb2-f45d-430f-95fa-afbc4f329b43
>>> >>>> 14/09/03 14:35:12 INFO spark.HttpServer: Starting HTTP Server
>>> >>>> 14/09/03 14:35:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >>>> 14/09/03 14:35:12 INFO server.AbstractConnector: Started
>>> >>>> SocketConnector@0.0.0.0:41773
>>> >>>> 14/09/03 14:35:13 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >>>> 14/09/03 14:35:13 INFO server.AbstractConnector: Started
>>> >>>> SelectChannelConnector@0.0.0.0:4040
>>> >>>> 14/09/03 14:35:13 INFO ui.SparkUI: Started SparkUI at
>>> >>>> http://HDOP-B.AGT:4040
>>> >>>> 14/09/03 14:35:13 WARN util.NativeCodeLoader: Unable to load
>>> >>>> native-hadoop library for your platform... using builtin-java
>>> classes where
>>> >>>> applicable
>>> >>>> --args is deprecated. Use --arg instead.
>>> >>>> 14/09/03 14:35:14 INFO client.RMProxy: Connecting to
>>> ResourceManager at
>>> >>>> HDOP-N1.AGT/10.193.1.72:8050
>>> >>>> 14/09/03 14:35:14 INFO yarn.Client: Got Cluster metric info from
>>> >>>> ApplicationsManager (ASM), number of NodeManagers: 6
>>> >>>> 14/09/03 14:35:14 INFO yarn.Client: Queue info ... queueName:
>>> default,
>>> >>>> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>>> >>>>       queueApplicationCount = 0, queueChildQueueCount = 0
>>> >>>> 14/09/03 14:35:14 INFO yarn.Client: Max mem capabililty of a single
>>> >>>> resource in this cluster 13824
>>> >>>> 14/09/03 14:35:14 INFO yarn.Client: Preparing Local resources
>>> >>>> 14/09/03 14:35:14 INFO yarn.Client: Uploading
>>> >>>>
>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>> >>>> to
>>> >>>>
>>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0036/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>> >>>> 14/09/03 14:35:16 INFO yarn.Client: Uploading
>>> >>>>
>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>> >>>> to
>>> >>>>
>>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0036/pi.py
>>> >>>> 14/09/03 14:35:16 INFO yarn.Client: Setting up the launch
>>> environment
>>> >>>> 14/09/03 14:35:16 INFO yarn.Client: Setting up container launch
>>> context
>>> >>>> 14/09/03 14:35:16 INFO yarn.Client: Command for starting the Spark
>>> >>>> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
>>> >>>> -Djava.io.tmpdir=$PWD/tmp,
>>> >>>>
>>> -Dspark.tachyonStore.folderName=\"spark-98b7d323-2faf-419a-a88d-1a0c549dc5d4\",
>>> >>>> -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
>>> >>>>
>>> -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
>>> >>>> -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
>>> >>>> -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\",
>>> >>>> -Dspark.fileserver.uri=\"http://10.193.1.76:41773\",
>>> >>>> -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"51707\",
>>> >>>> -Dspark.executor.cores=\"1\",
>>> >>>> -Dspark.httpBroadcast.uri=\"http://10.193.1.76:50624\",
>>> >>>> -Dlog4j.configuration=log4j-spark-container.properties,
>>> >>>> org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused,
>>> --jar ,
>>> >>>> null,  --args  'HDOP-B.AGT:51707' , --executor-memory, 2048,
>>> >>>> --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
>>> >>>> <LOG_DIR>/stderr)
>>> >>>> 14/09/03 14:35:16 INFO yarn.Client: Submitting application to ASM
>>> >>>> 14/09/03 14:35:16 INFO impl.YarnClientImpl: Submitted application
>>> >>>> application_1409559972905_0036
>>> >>>> 14/09/03 14:35:16 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>> report from ASM:
>>> >>>> appMasterRpcPort: -1
>>> >>>> appStartTime: 1409726116517
>>> >>>> yarnAppState: ACCEPTED
>>> >>>>
>>> >>>> 14/09/03 14:35:17 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>> report from ASM:
>>> >>>> appMasterRpcPort: -1
>>> >>>> appStartTime: 1409726116517
>>> >>>> yarnAppState: ACCEPTED
>>> >>>>
>>> >>>> 14/09/03 14:35:18 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>> report from ASM:
>>> >>>> appMasterRpcPort: -1
>>> >>>> appStartTime: 1409726116517
>>> >>>> yarnAppState: ACCEPTED
>>> >>>>
>>> >>>> 14/09/03 14:35:19 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>> report from ASM:
>>> >>>> appMasterRpcPort: -1
>>> >>>> appStartTime: 1409726116517
>>> >>>> yarnAppState: ACCEPTED
>>> >>>>
>>> >>>> 14/09/03 14:35:20 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>> report from ASM:
>>> >>>> appMasterRpcPort: -1
>>> >>>> appStartTime: 1409726116517
>>> >>>> yarnAppState: ACCEPTED
>>> >>>>
>>> >>>> 14/09/03 14:35:21 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>> report from ASM:
>>> >>>> appMasterRpcPort: -1
>>> >>>> appStartTime: 1409726116517
>>> >>>> yarnAppState: ACCEPTED
>>> >>>>
>>> >>>> 14/09/03 14:35:22 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>> report from ASM:
>>> >>>> appMasterRpcPort: 0
>>> >>>> appStartTime: 1409726116517
>>> >>>> yarnAppState: RUNNING
>>> >>>>
>>> >>>> 14/09/03 14:35:24 INFO cluster.YarnClientClusterScheduler:
>>> >>>> YarnClientClusterScheduler.postStartHook done
>>> >>>> 14/09/03 14:35:25 INFO cluster.YarnClientSchedulerBackend:
>>> Registered
>>> >>>> executor:
>>> >>>> Actor[akka.tcp://sparkExecutor@HDOP-B.AGT
>>> :58976/user/Executor#-1831707618]
>>> >>>> with ID 1
>>> >>>> 14/09/03 14:35:26 INFO storage.BlockManagerInfo: Registering block
>>> >>>> manager HDOP-B.AGT:44142 with 1178.1 MB RAM
>>> >>>> 14/09/03 14:35:26 INFO cluster.YarnClientSchedulerBackend:
>>> Registered
>>> >>>> executor:
>>> >>>> Actor[akka.tcp://sparkExecutor@HDOP-N1.AGT
>>> :45140/user/Executor#875812337]
>>> >>>> with ID 2
>>> >>>> 14/09/03 14:35:26 INFO storage.BlockManagerInfo: Registering block
>>> >>>> manager HDOP-N1.AGT:48513 with 1178.1 MB RAM
>>> >>>> 14/09/03 14:35:26 INFO cluster.YarnClientSchedulerBackend:
>>> Registered
>>> >>>> executor:
>>> >>>> Actor[akka.tcp://sparkExecutor@HDOP-N3.AGT
>>> :45380/user/Executor#1559437246]
>>> >>>> with ID 3
>>> >>>> 14/09/03 14:35:27 INFO storage.BlockManagerInfo: Registering block
>>> >>>> manager HDOP-N3.AGT:46616 with 1178.1 MB RAM
>>> >>>> 14/09/03 14:35:56 INFO spark.SparkContext: Starting job: reduce at
>>> >>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Got job 0 (reduce at
>>> >>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>>> >>>> with 1000 output partitions (allowLocal=false)
>>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Final stage: Stage
>>> >>>> 0(reduce at
>>> >>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Parents of final
>>> stage:
>>> >>>> List()
>>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Missing parents:
>>> List()
>>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Submitting Stage 0
>>> >>>> (PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing
>>> parents
>>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Submitting 1000
>>> missing
>>> >>>> tasks from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
>>> >>>> 14/09/03 14:35:56 INFO cluster.YarnClientClusterScheduler: Adding
>>> task
>>> >>>> set 0.0 with 1000 tasks
>>> >>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:0 as
>>> >>>> TID 0 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0
>>> >>>> as 369811 bytes in 9 ms
>>> >>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:1 as
>>> >>>> TID 1 on executor 3: HDOP-N3.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1
>>> >>>> as 506276 bytes in 5 ms
>>> >>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:2 as
>>> >>>> TID 2 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2
>>> >>>> as 501136 bytes in 5 ms
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:3 as
>>> >>>> TID 3 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3
>>> >>>> as 506276 bytes in 4 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 2 (task
>>> 0.0:2)
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>
>>> >>>> at
>>> >>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>> >>>> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>> >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> >>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>> >>>> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>> >>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>> >>>> at
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>>> at
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>>> at java.lang.Thread.run(Thread.java:744)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:2 as
>>> >>>> TID 4 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2
>>> >>>> as 501136 bytes in 4 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 0 (task
>>> 0.0:0)
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>
>>> >>>> at
>>> >>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>> >>>> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>> >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> >>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>> >>>> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>> >>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>> >>>> at
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>>> at
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>>> at java.lang.Thread.run(Thread.java:744)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:0 as
>>> >>>> TID 5 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0
>>> >>>> as 369811 bytes in 3 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 3 (task
>>> 0.0:3)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 1]
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:3 as
>>> >>>> TID 6 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3
>>> >>>> as 506276 bytes in 4 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 4 (task
>>> 0.0:2)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 1]
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:2 as
>>> >>>> TID 7 on executor 3: HDOP-N3.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2
>>> >>>> as 501136 bytes in 4 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 1 (task
>>> 0.0:1)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 2]
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:1 as
>>> >>>> TID 8 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1
>>> >>>> as 506276 bytes in 4 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 5 (task
>>> 0.0:0)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 3]
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:0 as
>>> >>>> TID 9 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0
>>> >>>> as 369811 bytes in 4 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 6 (task
>>> 0.0:3)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 2]
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:3 as
>>> >>>> TID 10 on executor 3: HDOP-N3.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3
>>> >>>> as 506276 bytes in 4 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 7 (task
>>> 0.0:2)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 4]
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:2 as
>>> >>>> TID 11 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2
>>> >>>> as 501136 bytes in 3 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 9 (task
>>> 0.0:0)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 3]
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:0 as
>>> >>>> TID 12 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0
>>> >>>> as 369811 bytes in 4 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 8 (task
>>> 0.0:1)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 5]
>>> >>>> 14/09/03 14:35:58 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:1 as
>>> >>>> TID 13 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:58 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1
>>> >>>> as 506276 bytes in 3 ms
>>> >>>> 14/09/03 14:35:58 WARN scheduler.TaskSetManager: Lost TID 11 (task
>>> >>>> 0.0:2)
>>> >>>> 14/09/03 14:35:58 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 4]
>>> >>>> 14/09/03 14:35:58 ERROR scheduler.TaskSetManager: Task 0.0:2 failed
>>> 4
>>> >>>> times; aborting job
>>> >>>> 14/09/03 14:35:58 INFO cluster.YarnClientClusterScheduler:
>>> Cancelling
>>> >>>> stage 0
>>> >>>> 14/09/03 14:35:58 INFO cluster.YarnClientClusterScheduler: Stage 0
>>> was
>>> >>>> cancelled
>>> >>>> 14/09/03 14:35:58 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 6]
>>> >>>> 14/09/03 14:35:58 INFO scheduler.DAGScheduler: Failed to run reduce
>>> at
>>> >>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>>> >>>> Traceback (most recent call last):
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 38, in <module>
>>> >>>>     count = sc.parallelize(xrange(1, n+1),
>>> slices).map(f).reduce(add)
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 619, in reduce
>>> >>>>     vals = self.mapPartitions(func).collect()
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 583, in collect
>>> >>>>     bytesInJava = self._jrdd.collect().iterator()
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>>> >>>> line 537, in __call__
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>>> >>>> line 300, in get_return_value
>>> >>>> py4j.protocol.Py4JJavaError14/09/03 14:35:58 INFO
>>> >>>> scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 7]
>>> >>>> : An error occurred while calling o24.collect.
>>> >>>> : org.apache.spark.SparkException: Job aborted due to stage failure:
>>> >>>> Task 0.0:2 failed 4 times, most recent failure: Exception failure
>>> in TID 11
>>> >>>> on host HDOP-N1.AGT: org.apache.spark.api.python.PythonException:
>>> Traceback
>>> >>>> (most recent call last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>
>>> >>>>
>>> >>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>> >>>>
>>> >>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>> >>>>
>>> >>>> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>> >>>>
>>>  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> >>>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> >>>>
>>> >>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>> >>>>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>>> >>>>
>>> >>>>
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>> >>>>
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>>>
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>>>         java.lang.Thread.run(Thread.java:744)
>>> >>>> Driver stacktrace:
>>> >>>> at
>>> >>>> org.apache.spark.scheduler.DAGScheduler.org
>>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>> >>>> at
>>> >>>>
>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>> >>>> at
>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>> >>>> at scala.Option.foreach(Option.scala:236)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>> >>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>> >>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>> >>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>> >>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>> >>>> at
>>> >>>>
>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>> >>>> at
>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> >>>> at
>>> >>>>
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> >>>> at
>>> >>>>
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> >>>> at
>>> >>>>
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> >>>>
>>> >>>> 14/09/03 14:35:58 WARN scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.TaskKilledException
>>> >>>> org.apache.spark.TaskKilledException
>>> >>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>>> >>>> at
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>>> at
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>>> at java.lang.Thread.run(Thread.java:744)
>>> >>>> 14/09/03 14:35:58 INFO cluster.YarnClientClusterScheduler: Removed
>>> >>>> TaskSet 0.0, whose tasks have all completed, from pool
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Sep 3, 2014 at 1:53 PM, Oleg Ruchovets <
>>> oruchovets@gmail.com>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> Hello Sandy , I changed to using yarn master but still got the
>>> >>>>> exceptions:
>>> >>>>>
>>> >>>>> What is the procedure to execute pyspark on yarn? is it required
>>> only
>>> >>>>> to attached the command , or it is required to start spark
>>> processes also?
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>>> >>>>> ./bin/spark-submit --master yarn://HDOP-N1.AGT:8032
>>> --num-executors 3
>>> >>>>> --driver-memory 4g --executor-memory 2g --executor-cores 1
>>> >>>>> examples/src/main/python/pi.py   1000
>>> >>>>> /usr/jdk64/jdk1.7.0_45/bin/java
>>> >>>>>
>>> >>>>>
>>> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar:/etc/hadoop/conf
>>> >>>>> -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
>>> >>>>> 14/09/03 13:48:48 INFO spark.SecurityManager: Changing view acls
>>> to:
>>> >>>>> root
>>> >>>>> 14/09/03 13:48:48 INFO spark.SecurityManager: SecurityManager:
>>> >>>>> authentication disabled; ui acls disabled; users with view
>>> permissions:
>>> >>>>> Set(root)
>>> >>>>> 14/09/03 13:48:49 INFO slf4j.Slf4jLogger: Slf4jLogger started
>>> >>>>> 14/09/03 13:48:49 INFO Remoting: Starting remoting
>>> >>>>> 14/09/03 13:48:49 INFO Remoting: Remoting started; listening on
>>> >>>>> addresses :[akka.tcp://spark@HDOP-B.AGT:34424]
>>> >>>>> 14/09/03 13:48:49 INFO Remoting: Remoting now listens on addresses:
>>> >>>>> [akka.tcp://spark@HDOP-B.AGT:34424]
>>> >>>>> 14/09/03 13:48:49 INFO spark.SparkEnv: Registering MapOutputTracker
>>> >>>>> 14/09/03 13:48:49 INFO spark.SparkEnv: Registering
>>> BlockManagerMaster
>>> >>>>> 14/09/03 13:48:49 INFO storage.DiskBlockManager: Created local
>>> >>>>> directory at /tmp/spark-local-20140903134849-231c
>>> >>>>> 14/09/03 13:48:49 INFO storage.MemoryStore: MemoryStore started
>>> with
>>> >>>>> capacity 2.3 GB.
>>> >>>>> 14/09/03 13:48:49 INFO network.ConnectionManager: Bound socket to
>>> port
>>> >>>>> 60647 with id = ConnectionManagerId(HDOP-B.AGT,60647)
>>> >>>>> 14/09/03 13:48:49 INFO storage.BlockManagerMaster: Trying to
>>> register
>>> >>>>> BlockManager
>>> >>>>> 14/09/03 13:48:49 INFO storage.BlockManagerInfo: Registering block
>>> >>>>> manager HDOP-B.AGT:60647 with 2.3 GB RAM
>>> >>>>> 14/09/03 13:48:49 INFO storage.BlockManagerMaster: Registered
>>> >>>>> BlockManager
>>> >>>>> 14/09/03 13:48:49 INFO spark.HttpServer: Starting HTTP Server
>>> >>>>> 14/09/03 13:48:49 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >>>>> 14/09/03 13:48:49 INFO server.AbstractConnector: Started
>>> >>>>> SocketConnector@0.0.0.0:56549
>>> >>>>> 14/09/03 13:48:49 INFO broadcast.HttpBroadcast: Broadcast server
>>> >>>>> started at http://10.193.1.76:56549
>>> >>>>> 14/09/03 13:48:49 INFO spark.HttpFileServer: HTTP File server
>>> directory
>>> >>>>> is /tmp/spark-90af1222-9ea8-4dd8-887a-343d09d44333
>>> >>>>> 14/09/03 13:48:49 INFO spark.HttpServer: Starting HTTP Server
>>> >>>>> 14/09/03 13:48:49 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >>>>> 14/09/03 13:48:49 INFO server.AbstractConnector: Started
>>> >>>>> SocketConnector@0.0.0.0:36512
>>> >>>>> 14/09/03 13:48:50 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >>>>> 14/09/03 13:48:50 INFO server.AbstractConnector: Started
>>> >>>>> SelectChannelConnector@0.0.0.0:4040
>>> >>>>> 14/09/03 13:48:50 INFO ui.SparkUI: Started SparkUI at
>>> >>>>> http://HDOP-B.AGT:4040
>>> >>>>> 14/09/03 13:48:50 WARN util.NativeCodeLoader: Unable to load
>>> >>>>> native-hadoop library for your platform... using builtin-java
>>> classes where
>>> >>>>> applicable
>>> >>>>> --args is deprecated. Use --arg instead.
>>> >>>>> 14/09/03 13:48:51 INFO client.RMProxy: Connecting to
>>> ResourceManager at
>>> >>>>> HDOP-N1.AGT/10.193.1.72:8050
>>> >>>>> 14/09/03 13:48:51 INFO yarn.Client: Got Cluster metric info from
>>> >>>>> ApplicationsManager (ASM), number of NodeManagers: 6
>>> >>>>> 14/09/03 13:48:51 INFO yarn.Client: Queue info ... queueName:
>>> default,
>>> >>>>> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>>> >>>>>       queueApplicationCount = 0, queueChildQueueCount = 0
>>> >>>>> 14/09/03 13:48:51 INFO yarn.Client: Max mem capabililty of a single
>>> >>>>> resource in this cluster 13824
>>> >>>>> 14/09/03 13:48:51 INFO yarn.Client: Preparing Local resources
>>> >>>>> 14/09/03 13:48:51 INFO yarn.Client: Uploading
>>> >>>>>
>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>> >>>>> to
>>> >>>>>
>>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0033/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>> >>>>> 14/09/03 13:48:53 INFO yarn.Client: Uploading
>>> >>>>>
>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>> >>>>> to
>>> >>>>>
>>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0033/pi.py
>>> >>>>> 14/09/03 13:48:53 INFO yarn.Client: Setting up the launch
>>> environment
>>> >>>>> 14/09/03 13:48:53 INFO yarn.Client: Setting up container launch
>>> context
>>> >>>>> 14/09/03 13:48:53 INFO yarn.Client: Command for starting the Spark
>>> >>>>> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
>>> >>>>> -Djava.io.tmpdir=$PWD/tmp,
>>> >>>>>
>>> -Dspark.tachyonStore.folderName=\"spark-bdabb882-a2e0-46b6-8e87-90cc6e359d84\",
>>> >>>>> -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
>>> >>>>>
>>> -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
>>> >>>>> -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
>>> >>>>> -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\",
>>> >>>>> -Dspark.fileserver.uri=\"http://10.193.1.76:36512\",
>>> >>>>> -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"34424\",
>>> >>>>> -Dspark.executor.cores=\"1\",
>>> >>>>> -Dspark.httpBroadcast.uri=\"http://10.193.1.76:56549\",
>>> >>>>> -Dlog4j.configuration=log4j-spark-container.properties,
>>> >>>>> org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused,
>>> --jar ,
>>> >>>>> null,  --args  'HDOP-B.AGT:34424' , --executor-memory, 2048,
>>> >>>>> --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
>>> >>>>> <LOG_DIR>/stderr)
>>> >>>>> 14/09/03 13:48:53 INFO yarn.Client: Submitting application to ASM
>>> >>>>> 14/09/03 13:48:53 INFO impl.YarnClientImpl: Submitted application
>>> >>>>> application_1409559972905_0033
>>> >>>>> 14/09/03 13:48:53 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>>> report from ASM:
>>> >>>>> appMasterRpcPort: -1
>>> >>>>> appStartTime: 1409723333584
>>> >>>>> yarnAppState: ACCEPTED
>>> >>>>>
>>> >>>>> 14/09/03 13:48:54 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>>> report from ASM:
>>> >>>>> appMasterRpcPort: -1
>>> >>>>> appStartTime: 1409723333584
>>> >>>>> yarnAppState: ACCEPTED
>>> >>>>>
>>> >>>>> 14/09/03 13:48:55 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>>> report from ASM:
>>> >>>>> appMasterRpcPort: -1
>>> >>>>> appStartTime: 1409723333584
>>> >>>>> yarnAppState: ACCEPTED
>>> >>>>>
>>> >>>>> 14/09/03 13:48:56 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>>> report from ASM:
>>> >>>>> appMasterRpcPort: -1
>>> >>>>> appStartTime: 1409723333584
>>> >>>>> yarnAppState: ACCEPTED
>>> >>>>>
>>> >>>>> 14/09/03 13:48:57 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>>> report from ASM:
>>> >>>>> appMasterRpcPort: -1
>>> >>>>> appStartTime: 1409723333584
>>> >>>>> yarnAppState: ACCEPTED
>>> >>>>>
>>> >>>>> 14/09/03 13:48:58 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>>> report from ASM:
>>> >>>>> appMasterRpcPort: 0
>>> >>>>> appStartTime: 1409723333584
>>> >>>>> yarnAppState: RUNNING
>>> >>>>>
>>> >>>>> 14/09/03 13:49:00 INFO cluster.YarnClientClusterScheduler:
>>> >>>>> YarnClientClusterScheduler.postStartHook done
>>> >>>>> 14/09/03 13:49:01 INFO cluster.YarnClientSchedulerBackend:
>>> Registered
>>> >>>>> executor:
>>> >>>>> Actor[akka.tcp://sparkExecutor@HDOP-B.AGT
>>> :57078/user/Executor#1595833626]
>>> >>>>> with ID 1
>>> >>>>> 14/09/03 13:49:02 INFO storage.BlockManagerInfo: Registering block
>>> >>>>> manager HDOP-B.AGT:54579 with 1178.1 MB RAM
>>> >>>>> 14/09/03 13:49:03 INFO cluster.YarnClientSchedulerBackend:
>>> Registered
>>> >>>>> executor:
>>> >>>>> Actor[akka.tcp://sparkExecutor@HDOP-N4.AGT
>>> :43121/user/Executor#-1266627304]
>>> >>>>> with ID 2
>>> >>>>> 14/09/03 13:49:03 INFO cluster.YarnClientSchedulerBackend:
>>> Registered
>>> >>>>> executor:
>>> >>>>> Actor[akka.tcp://sparkExecutor@HDOP-N2.AGT
>>> :36952/user/Executor#1003961369]
>>> >>>>> with ID 3
>>> >>>>> 14/09/03 13:49:04 INFO storage.BlockManagerInfo: Registering block
>>> >>>>> manager HDOP-N4.AGT:56891 with 1178.1 MB RAM
>>> >>>>> 14/09/03 13:49:04 INFO storage.BlockManagerInfo: Registering block
>>> >>>>> manager HDOP-N2.AGT:42381 with 1178.1 MB RAM
>>> >>>>> 14/09/03 13:49:33 INFO spark.SparkContext: Starting job: reduce at
>>> >>>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>>> >>>>> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Got job 0 (reduce at
>>> >>>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>>> >>>>> with 1000 output partitions (allowLocal=false)
>>> >>>>> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Final stage: Stage
>>> >>>>> 0(reduce at
>>> >>>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>>> >>>>> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Parents of final
>>> stage:
>>> >>>>> List()
>>> >>>>> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Missing parents:
>>> List()
>>> >>>>> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Submitting Stage 0
>>> >>>>> (PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing
>>> parents
>>> >>>>> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Submitting 1000
>>> missing
>>> >>>>> tasks from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
>>> >>>>> 14/09/03 13:49:33 INFO cluster.YarnClientClusterScheduler: Adding
>>> task
>>> >>>>> set 0.0 with 1000 tasks
>>> >>>>> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:0 as
>>> >>>>> TID 0 on executor 2: HDOP-N4.AGT (PROCESS_LOCAL)
>>> >>>>> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0
>>> >>>>> as 369811 bytes in 4 ms
>>> >>>>> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:1 as
>>> >>>>> TID 1 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
>>> >>>>> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1
>>> >>>>> as 506276 bytes in 5 ms
>>> >>>>> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:2 as
>>> >>>>> TID 2 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>> >>>>> 14/09/03 13:49:33 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2
>>> >>>>> as 501136 bytes in 5 ms
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:3 as
>>> >>>>> TID 3 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3
>>> >>>>> as 506276 bytes in 5 ms
>>> >>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 2 (task
>>> >>>>> 0.0:2)
>>> >>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Loss was due to
>>> >>>>> org.apache.spark.api.python.PythonException
>>> >>>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> >>>>> call last):
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>>> line 77, in main
>>> >>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 191, in dump_stream
>>> >>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 123, in dump_stream
>>> >>>>>     for obj in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 180, in _batched
>>> >>>>>     for item in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 612, in func
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>> line 36, in f
>>> >>>>> SystemError: unknown opcode
>>> >>>>>
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>> >>>>> at
>>> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>> >>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> >>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> >>>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>> >>>>> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>> >>>>> at
>>> >>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>>>> at
>>> >>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>>>> at java.lang.Thread.run(Thread.java:744)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:2 as
>>> >>>>> TID 4 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2
>>> >>>>> as 501136 bytes in 4 ms
>>> >>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 1 (task
>>> >>>>> 0.0:1)
>>> >>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Loss was due to
>>> >>>>> org.apache.spark.api.python.PythonException
>>> >>>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> >>>>> call last):
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>>> line 77, in main
>>> >>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 191, in dump_stream
>>> >>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 123, in dump_stream
>>> >>>>>     for obj in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 180, in _batched
>>> >>>>>     for item in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 612, in func
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>> line 36, in f
>>> >>>>> SystemError: unknown opcode
>>> >>>>>
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>> >>>>> at
>>> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>> >>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> >>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> >>>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>> >>>>> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>> >>>>> at
>>> >>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>>>> at
>>> >>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>>>> at java.lang.Thread.run(Thread.java:744)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:1 as
>>> >>>>> TID 5 on executor 2: HDOP-N4.AGT (PROCESS_LOCAL)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1
>>> >>>>> as 506276 bytes in 4 ms
>>> >>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 0 (task
>>> >>>>> 0.0:0)
>>> >>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Loss was due to
>>> >>>>> org.apache.spark.api.python.PythonException
>>> >>>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> >>>>> call last):
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>>> line 77, in main
>>> >>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 191, in dump_stream
>>> >>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 123, in dump_stream
>>> >>>>>     for obj in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 180, in _batched
>>> >>>>>     for item in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 612, in func
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>> line 36, in f
>>> >>>>> SystemError: unknown opcode
>>> >>>>>
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>> >>>>> at
>>> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>> >>>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> >>>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> >>>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>> >>>>> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>> >>>>> at
>>> >>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>>>> at
>>> >>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>>>> at java.lang.Thread.run(Thread.java:744)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:0 as
>>> >>>>> TID 6 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0
>>> >>>>> as 369811 bytes in 4 ms
>>> >>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 3 (task
>>> >>>>> 0.0:3)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>>> org.apache.spark.api.python.PythonException: Traceback (most
>>> recent call
>>> >>>>> last):
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>>> line 77, in main
>>> >>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 191, in dump_stream
>>> >>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 123, in dump_stream
>>> >>>>>     for obj in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 180, in _batched
>>> >>>>>     for item in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 612, in func
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>> line 36, in f
>>> >>>>> SystemError: unknown opcode
>>> >>>>>  [duplicate 1]
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:3 as
>>> >>>>> TID 7 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3
>>> >>>>> as 506276 bytes in 4 ms
>>> >>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 4 (task
>>> >>>>> 0.0:2)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>>> org.apache.spark.api.python.PythonException: Traceback (most
>>> recent call
>>> >>>>> last):
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>>> line 77, in main
>>> >>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 191, in dump_stream
>>> >>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 123, in dump_stream
>>> >>>>>     for obj in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 180, in _batched
>>> >>>>>     for item in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 612, in func
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>> line 36, in f
>>> >>>>> SystemError: unknown opcode
>>> >>>>>  [duplicate 1]
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:2 as
>>> >>>>> TID 8 on executor 2: HDOP-N4.AGT (PROCESS_LOCAL)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2
>>> >>>>> as 501136 bytes in 3 ms
>>> >>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 5 (task
>>> >>>>> 0.0:1)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>>> org.apache.spark.api.python.PythonException: Traceback (most
>>> recent call
>>> >>>>> last):
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>>> line 77, in main
>>> >>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 191, in dump_stream
>>> >>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 123, in dump_stream
>>> >>>>>     for obj in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 180, in _batched
>>> >>>>>     for item in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 612, in func
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>> line 36, in f
>>> >>>>> SystemError: unknown opcode
>>> >>>>>  [duplicate 1]
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:1 as
>>> >>>>> TID 9 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1
>>> >>>>> as 506276 bytes in 4 ms
>>> >>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 6 (task
>>> >>>>> 0.0:0)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>>> org.apache.spark.api.python.PythonException: Traceback (most
>>> recent call
>>> >>>>> last):
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>>> line 77, in main
>>> >>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 191, in dump_stream
>>> >>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 123, in dump_stream
>>> >>>>>     for obj in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 180, in _batched
>>> >>>>>     for item in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 612, in func
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>> line 36, in f
>>> >>>>> SystemError: unknown opcode
>>> >>>>>  [duplicate 2]
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:0 as
>>> >>>>> TID 10 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0
>>> >>>>> as 369811 bytes in 3 ms
>>> >>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 7 (task
>>> >>>>> 0.0:3)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>>> org.apache.spark.api.python.PythonException: Traceback (most
>>> recent call
>>> >>>>> last):
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>>> line 77, in main
>>> >>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 191, in dump_stream
>>> >>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 123, in dump_stream
>>> >>>>>     for obj in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 180, in _batched
>>> >>>>>     for item in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 612, in func
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>> line 36, in f
>>> >>>>> SystemError: unknown opcode
>>> >>>>>  [duplicate 2]
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:3 as
>>> >>>>> TID 11 on executor 2: HDOP-N4.AGT (PROCESS_LOCAL)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3
>>> >>>>> as 506276 bytes in 4 ms
>>> >>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 8 (task
>>> >>>>> 0.0:2)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>>> org.apache.spark.api.python.PythonException: Traceback (most
>>> recent call
>>> >>>>> last):
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>>> line 77, in main
>>> >>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 191, in dump_stream
>>> >>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 123, in dump_stream
>>> >>>>>     for obj in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 180, in _batched
>>> >>>>>     for item in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 612, in func
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>> line 36, in f
>>> >>>>> SystemError: unknown opcode
>>> >>>>>  [duplicate 2]
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:2 as
>>> >>>>> TID 12 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2
>>> >>>>> as 501136 bytes in 3 ms
>>> >>>>> 14/09/03 13:49:34 WARN scheduler.TaskSetManager: Lost TID 9 (task
>>> >>>>> 0.0:1)
>>> >>>>> 14/09/03 13:49:34 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>>> org.apache.spark.api.python.PythonException: Traceback (most
>>> recent call
>>> >>>>> last):
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>>> line 77, in main
>>> >>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 191, in dump_stream
>>> >>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 123, in dump_stream
>>> >>>>>     for obj in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/15/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 180, in _batched
>>> >>>>>     for item in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 612, in func
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>> line 36, in f
>>> >>>>> SystemError: unknown opcode
>>> >>>>>  [duplicate 3]
>>> >>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:1 as
>>> >>>>> TID 13 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
>>> >>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1
>>> >>>>> as 506276 bytes in 4 ms
>>> >>>>> 14/09/03 13:49:35 WARN scheduler.TaskSetManager: Lost TID 10 (task
>>> >>>>> 0.0:0)
>>> >>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>>> org.apache.spark.api.python.PythonException: Traceback (most
>>> recent call
>>> >>>>> last):
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>>> line 77, in main
>>> >>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 191, in dump_stream
>>> >>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 123, in dump_stream
>>> >>>>>     for obj in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 180, in _batched
>>> >>>>>     for item in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 612, in func
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>> line 36, in f
>>> >>>>> SystemError: unknown opcode
>>> >>>>>  [duplicate 3]
>>> >>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:0 as
>>> >>>>> TID 14 on executor 2: HDOP-N4.AGT (PROCESS_LOCAL)
>>> >>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0
>>> >>>>> as 369811 bytes in 4 ms
>>> >>>>> 14/09/03 13:49:35 WARN scheduler.TaskSetManager: Lost TID 11 (task
>>> >>>>> 0.0:3)
>>> >>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>>> org.apache.spark.api.python.PythonException: Traceback (most
>>> recent call
>>> >>>>> last):
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>>> line 77, in main
>>> >>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 191, in dump_stream
>>> >>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 123, in dump_stream
>>> >>>>>     for obj in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 180, in _batched
>>> >>>>>     for item in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 612, in func
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>> line 36, in f
>>> >>>>> SystemError: unknown opcode
>>> >>>>>  [duplicate 3]
>>> >>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:3 as
>>> >>>>> TID 15 on executor 3: HDOP-N2.AGT (PROCESS_LOCAL)
>>> >>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3
>>> >>>>> as 506276 bytes in 3 ms
>>> >>>>> 14/09/03 13:49:35 WARN scheduler.TaskSetManager: Lost TID 13 (task
>>> >>>>> 0.0:1)
>>> >>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>>> org.apache.spark.api.python.PythonException: Traceback (most
>>> recent call
>>> >>>>> last):
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>>> line 77, in main
>>> >>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 191, in dump_stream
>>> >>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 123, in dump_stream
>>> >>>>>     for obj in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 180, in _batched
>>> >>>>>     for item in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 612, in func
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>> line 36, in f
>>> >>>>> SystemError: unknown opcode
>>> >>>>>  [duplicate 4]
>>> >>>>> 14/09/03 13:49:35 ERROR scheduler.TaskSetManager: Task 0.0:1
>>> failed 4
>>> >>>>> times; aborting job
>>> >>>>> 14/09/03 13:49:35 INFO cluster.YarnClientClusterScheduler:
>>> Cancelling
>>> >>>>> stage 0
>>> >>>>> 14/09/03 13:49:35 INFO cluster.YarnClientClusterScheduler: Stage 0
>>> was
>>> >>>>> cancelled
>>> >>>>> 14/09/03 13:49:35 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>>> org.apache.spark.api.python.PythonException: Traceback (most
>>> recent call
>>> >>>>> last):
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>>> line 77, in main
>>> >>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 191, in dump_stream
>>> >>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 123, in dump_stream
>>> >>>>>     for obj in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 180, in _batched
>>> >>>>>     for item in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 612, in func
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>> line 36, in f
>>> >>>>> SystemError: unknown opcode
>>> >>>>>  [duplicate 4]
>>> >>>>> 14/09/03 13:49:35 INFO scheduler.DAGScheduler: Failed to run
>>> reduce at
>>> >>>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>>> >>>>> Traceback (most recent call last):
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>> line 38, in <module>
>>> >>>>>     count = sc.parallelize(xrange(1, n+1),
>>> slices).map(f).reduce(add)
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 619, in reduce
>>> >>>>>     vals = self.mapPartitions(func).collect()
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 583, in collect
>>> >>>>>     bytesInJava = self._jrdd.collect().iterator()
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>>> >>>>> line 537, in __call__
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>>> >>>>> line 300, in get_return_value
>>> >>>>> py4j.protocol.Py4JJavaError: An error occurred while calling
>>> >>>>> o24.collect.
>>> >>>>> : org.apache.spark.SparkException: Job aborted due to stage
>>> failure:
>>> >>>>> Task 0.0:1 failed 4 times, most recent failure: Exception failure
>>> in TID 13
>>> >>>>> on host HDOP-N2.AGT: org.apache.spark.api.python.PythonException:
>>> Traceback
>>> >>>>> (most recent call last):
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>>> line 77, in main
>>> >>>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 191, in dump_stream
>>> >>>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 123, in dump_stream
>>> >>>>>     for obj in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/23/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>>> line 180, in _batched
>>> >>>>>     for item in iterator:
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>>> line 612, in func
>>> >>>>>   File
>>> >>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>> line 36, in f
>>> >>>>> SystemError: unknown opcode
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>> >>>>>
>>> >>>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>> >>>>>
>>> >>>>> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>> >>>>>
>>>  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> >>>>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> >>>>>
>>> >>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>> >>>>>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>>> >>>>>
>>> >>>>>
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>> >>>>>
>>> >>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>>>>
>>> >>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>>>>         java.lang.Thread.run(Thread.java:744)
>>> >>>>> Driver stacktrace:
>>> >>>>> at
>>> >>>>> org.apache.spark.scheduler.DAGScheduler.org
>>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>> >>>>> at
>>> >>>>>
>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>> >>>>> at
>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>> >>>>> at scala.Option.foreach(Option.scala:236)
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>> >>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>> >>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>> >>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>> >>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>> >>>>> at
>>> >>>>>
>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>> >>>>> at
>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> >>>>> at
>>> >>>>>
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> >>>>> at
>>> >>>>>
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> >>>>> at
>>> >>>>>
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> >>>>>
>>> >>>>> 14/09/03 13:49:35 WARN scheduler.TaskSetManager: Loss was due to
>>> >>>>> org.apache.spark.TaskKilledException
>>> >>>>> org.apache.spark.TaskKilledException
>>> >>>>> at
>>> >>>>>
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>>> >>>>> at
>>> >>>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>>>> at
>>> >>>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>>>> at java.lang.Thread.run(Thread.java:744)
>>> >>>>>
>>> >>>>>
>>> >>>>> On Wed, Sep 3, 2014 at 1:40 PM, Sandy Ryza <
>>> sandy.ryza@cloudera.com>
>>> >>>>> wrote:
>>> >>>>>>
>>> >>>>>> Hi Oleg. To run on YARN, simply set master to "yarn".  The YARN
>>> >>>>>> configuration, located in a yarn-site.xml, determines where to
>>> look for the
>>> >>>>>> YARN ResourceManager.
>>> >>>>>>
>>> >>>>>> PROCESS_LOCAL is orthogonal to the choice of cluster resource
>>> manager.
>>> >>>>>> A task is considered PROCESS_LOCAL when the executor it's running
>>> in happens
>>> >>>>>> to have the data it's processing cached.
>>> >>>>>>
>>> >>>>>> If you're looking to get familiar with the kind of confusing web
>>> of
>>> >>>>>> terminology, this blog post might be helpful:
>>> >>>>>>
>>> >>>>>>
>>> http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/
>>> >>>>>>
>>> >>>>>> -Sandy
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On Tue, Sep 2, 2014 at 9:51 PM, Oleg Ruchovets <
>>> oruchovets@gmail.com>
>>> >>>>>> wrote:
>>> >>>>>>>
>>> >>>>>>> Hi ,
>>> >>>>>>>   I change my command to :
>>> >>>>>>>   ./bin/spark-submit --master spark://HDOP-B.AGT:7077
>>> --num-executors
>>> >>>>>>> 3  --driver-memory 4g --executor-memory 2g --executor-cores 1
>>> >>>>>>> examples/src/main/python/pi.py   1000
>>> >>>>>>> and it fixed the problem.
>>> >>>>>>>
>>> >>>>>>> I still have couple of questions:
>>> >>>>>>>    PROCESS_LOCAL is not Yarn execution , right? how should I
>>> >>>>>>> configure the running on yarn? Should I exeture start-all script
>>> on all
>>> >>>>>>> machine or only one?  Where is the UI / LOGS of spark execution?
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> 152 152 SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:14 0.2
>>> s
>>> >>>>>>> 0 0 SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s 39
>>> ms
>>> >>>>>>> 2 2 SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s
>>> 39 ms
>>> >>>>>>> 3 3 SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s 39
>>> ms1
>>> >>>>>>> ms
>>> >>>>>>> 4 4 SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.8 s
>>> 39 ms
>>> >>>>>>> 2 ms
>>> >>>>>>> 5 5 SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.8 s 39
>>> ms1
>>> >>>>>>> ms
>>> >>>>>>> 6 6 SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.8 s 1
>>> ms
>>> >>>>>>> 7 7 SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s
>>> >>>>>>> 8 8 SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.3 s
>>> >>>>>>> 9 9 SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.4 s
>>> >>>>>>> 10 10 SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.3 s
>>> 1 ms
>>> >>>>>>> 11 11 SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.3 s
>>> >>>>>>>
>>> >>>>>>>
>>> >>>>>>> On Wed, Sep 3, 2014 at 12:19 PM, Oleg Ruchovets
>>> >>>>>>> <oruchovets@gmail.com> wrote:
>>> >>>>>>>>
>>> >>>>>>>> Hi Andrew.
>>> >>>>>>>>    what should I do to set master on yarn, can you please
>>> pointing
>>> >>>>>>>> me on command or documentation how to do it?
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> I am doing the following:
>>> >>>>>>>>    executed start-all.sh
>>> >>>>>>>>    [root@HDOP-B sbin]# ./start-all.sh
>>> >>>>>>>> starting org.apache.spark.deploy.master.Master, logging to
>>> >>>>>>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-HDOP-B.AGT.out
>>> >>>>>>>> localhost: Warning: Permanently added 'localhost' (RSA) to the
>>> list
>>> >>>>>>>> of known hosts.
>>> >>>>>>>> localhost: starting org.apache.spark.deploy.worker.Worker,
>>> logging
>>> >>>>>>>> to
>>> >>>>>>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-HDOP-B.AGT.out
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> after execute the command:
>>> >>>>>>>>     ./bin/spark-submit --master spark://HDOP-B.AGT:7077
>>> >>>>>>>> examples/src/main/python/pi.py 1000
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> the result is the following:
>>> >>>>>>>>
>>> >>>>>>>>    /usr/jdk64/jdk1.7.0_45/bin/java
>>> >>>>>>>>
>>> >>>>>>>>
>>> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>> >>>>>>>> -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
>>> >>>>>>>> 14/09/03 12:10:06 INFO SecurityManager: Using Spark's default
>>> log4j
>>> >>>>>>>> profile: org/apache/spark/log4j-defaults.properties
>>> >>>>>>>> 14/09/03 12:10:06 INFO SecurityManager: Changing view acls to:
>>> root
>>> >>>>>>>> 14/09/03 12:10:06 INFO SecurityManager: SecurityManager:
>>> >>>>>>>> authentication disabled; ui acls disabled; users with view
>>> permissions:
>>> >>>>>>>> Set(root)
>>> >>>>>>>> 14/09/03 12:10:07 INFO Slf4jLogger: Slf4jLogger started
>>> >>>>>>>> 14/09/03 12:10:07 INFO Remoting: Starting remoting
>>> >>>>>>>> 14/09/03 12:10:07 INFO Remoting: Remoting started; listening on
>>> >>>>>>>> addresses :[akka.tcp://spark@HDOP-B.AGT:38944]
>>> >>>>>>>> 14/09/03 12:10:07 INFO Remoting: Remoting now listens on
>>> addresses:
>>> >>>>>>>> [akka.tcp://spark@HDOP-B.AGT:38944]
>>> >>>>>>>> 14/09/03 12:10:07 INFO SparkEnv: Registering MapOutputTracker
>>> >>>>>>>> 14/09/03 12:10:07 INFO SparkEnv: Registering BlockManagerMaster
>>> >>>>>>>> 14/09/03 12:10:08 INFO DiskBlockManager: Created local
>>> directory at
>>> >>>>>>>> /tmp/spark-local-20140903121008-cf09
>>> >>>>>>>> 14/09/03 12:10:08 INFO MemoryStore: MemoryStore started with
>>> >>>>>>>> capacity 294.9 MB.
>>> >>>>>>>> 14/09/03 12:10:08 INFO ConnectionManager: Bound socket to port
>>> 45041
>>> >>>>>>>> with id = ConnectionManagerId(HDOP-B.AGT,45041)
>>> >>>>>>>> 14/09/03 12:10:08 INFO BlockManagerMaster: Trying to register
>>> >>>>>>>> BlockManager
>>> >>>>>>>> 14/09/03 12:10:08 INFO BlockManagerInfo: Registering block
>>> manager
>>> >>>>>>>> HDOP-B.AGT:45041 with 294.9 MB RAM
>>> >>>>>>>> 14/09/03 12:10:08 INFO BlockManagerMaster: Registered
>>> BlockManager
>>> >>>>>>>> 14/09/03 12:10:08 INFO HttpServer: Starting HTTP Server
>>> >>>>>>>> 14/09/03 12:10:08 INFO HttpBroadcast: Broadcast server started
>>> at
>>> >>>>>>>> http://10.193.1.76:59336
>>> >>>>>>>> 14/09/03 12:10:08 INFO HttpFileServer: HTTP File server
>>> directory is
>>> >>>>>>>> /tmp/spark-7bf5c3c3-1c02-41e8-9fb0-983e175dd45c
>>> >>>>>>>> 14/09/03 12:10:08 INFO HttpServer: Starting HTTP Server
>>> >>>>>>>> 14/09/03 12:10:08 INFO SparkUI: Started SparkUI at
>>> >>>>>>>> http://HDOP-B.AGT:4040
>>> >>>>>>>> 14/09/03 12:10:09 WARN NativeCodeLoader: Unable to load
>>> >>>>>>>> native-hadoop library for your platform... using builtin-java
>>> classes where
>>> >>>>>>>> applicable
>>> >>>>>>>> 14/09/03 12:10:09 INFO Utils: Copying
>>> >>>>>>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>> >>>>>>>> to /tmp/spark-4e252376-70cb-4171-bf2c-d804524e816c/pi.py
>>> >>>>>>>> 14/09/03 12:10:09 INFO SparkContext: Added file
>>> >>>>>>>>
>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>> >>>>>>>> at http://10.193.1.76:45893/files/pi.py with timestamp
>>> 1409717409277
>>> >>>>>>>> 14/09/03 12:10:09 INFO AppClient$ClientActor: Connecting to
>>> master
>>> >>>>>>>> spark://HDOP-B.AGT:7077...
>>> >>>>>>>> 14/09/03 12:10:09 INFO SparkDeploySchedulerBackend: Connected to
>>> >>>>>>>> Spark cluster with app ID app-20140903121009-0000
>>> >>>>>>>> 14/09/03 12:10:09 INFO AppClient$ClientActor: Executor added:
>>> >>>>>>>> app-20140903121009-0000/0 on
>>> worker-20140903120712-HDOP-B.AGT-51161
>>> >>>>>>>> (HDOP-B.AGT:51161) with 8 cores
>>> >>>>>>>> 14/09/03 12:10:09 INFO SparkDeploySchedulerBackend: Granted
>>> executor
>>> >>>>>>>> ID app-20140903121009-0000/0 on hostPort HDOP-B.AGT:51161 with
>>> 8 cores,
>>> >>>>>>>> 512.0 MB RAM
>>> >>>>>>>> 14/09/03 12:10:09 INFO AppClient$ClientActor: Executor updated:
>>> >>>>>>>> app-20140903121009-0000/0 is now RUNNING
>>> >>>>>>>> 14/09/03 12:10:12 INFO SparkDeploySchedulerBackend: Registered
>>> >>>>>>>> executor:
>>> >>>>>>>> Actor[akka.tcp://sparkExecutor@HDOP-B.AGT
>>> :38143/user/Executor#1295757828]
>>> >>>>>>>> with ID 0
>>> >>>>>>>> 14/09/03 12:10:12 INFO BlockManagerInfo: Registering block
>>> manager
>>> >>>>>>>> HDOP-B.AGT:38670 with 294.9 MB RAM
>>> >>>>>>>> Traceback (most recent call last):
>>> >>>>>>>>   File
>>> >>>>>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>>>>> line 38, in <module>
>>> >>>>>>>>     count = sc.parallelize(xrange(1, n+1),
>>> >>>>>>>> slices).map(f).reduce(add)
>>> >>>>>>>>   File
>>> >>>>>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>>> >>>>>>>> line 271, in parallelize
>>> >>>>>>>>     jrdd = readRDDFromFile(self._jsc, tempFile.name, numSlices)
>>> >>>>>>>>   File
>>> >>>>>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>>> >>>>>>>> line 537, in __call__
>>> >>>>>>>>   File
>>> >>>>>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>>> >>>>>>>> line 300, in get_return_value
>>> >>>>>>>> py4j.protocol.Py4JJavaError: An error occurred while calling
>>> >>>>>>>> z:org.apache.spark.api.python.PythonRDD.readRDDFromFile.
>>> >>>>>>>> : java.lang.OutOfMemoryError: Java heap space
>>> >>>>>>>> at
>>> >>>>>>>>
>>> org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279)
>>> >>>>>>>> at
>>> >>>>>>>>
>>> org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala)
>>> >>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> >>>>>>>> at
>>> >>>>>>>>
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> >>>>>>>> at
>>> >>>>>>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> >>>>>>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>> >>>>>>>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>>> >>>>>>>> at
>>> >>>>>>>>
>>> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>> >>>>>>>> at py4j.Gateway.invoke(Gateway.java:259)
>>> >>>>>>>> at
>>> >>>>>>>>
>>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>>> >>>>>>>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>> >>>>>>>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>> >>>>>>>> at java.lang.Thread.run(Thread.java:744)
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> What should I do to fix the issue
>>> >>>>>>>>
>>> >>>>>>>> Thanks
>>> >>>>>>>> Oleg.
>>> >>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>> On Tue, Sep 2, 2014 at 10:32 PM, Andrew Or <
>>> andrew@databricks.com>
>>> >>>>>>>> wrote:
>>> >>>>>>>>>
>>> >>>>>>>>> Hi Oleg,
>>> >>>>>>>>>
>>> >>>>>>>>> If you are running Spark on a yarn cluster, you should set
>>> --master
>>> >>>>>>>>> to yarn. By default this runs in client mode, which redirects
>>> all output of
>>> >>>>>>>>> your application to your console. This is failing because it
>>> is trying to
>>> >>>>>>>>> connect to a standalone master that you probably did not
>>> start. I am
>>> >>>>>>>>> somewhat puzzled as to how you ran into an OOM from this
>>> configuration,
>>> >>>>>>>>> however. Does this problem still occur if you set the correct
>>> master?
>>> >>>>>>>>>
>>> >>>>>>>>> -Andrew
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>> 2014-09-02 2:42 GMT-07:00 Oleg Ruchovets <oruchovets@gmail.com
>>> >:
>>> >>>>>>>>>
>>> >>>>>>>>>> Hi ,
>>> >>>>>>>>>>    I've installed pyspark on hpd hortonworks cluster.
>>> >>>>>>>>>>   Executing pi example:
>>> >>>>>>>>>>
>>> >>>>>>>>>> command:
>>> >>>>>>>>>>        spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>>> >>>>>>>>>> ./bin/spark-submit --master spark://10.193.1.71:7077
>>> >>>>>>>>>> examples/src/main/python/pi.py   1000
>>> >>>>>>>>>>
>>> >>>>>>>>>> exception:
>>> >>>>>>>>>>
>>> >>>>>>>>>>     14/09/02 17:34:02 INFO SecurityManager: Using Spark's
>>> default
>>> >>>>>>>>>> log4j profile: org/apache/spark/log4j-defaults.properties
>>> >>>>>>>>>> 14/09/02 17:34:02 INFO SecurityManager: Changing view acls to:
>>> >>>>>>>>>> root
>>> >>>>>>>>>> 14/09/02 17:34:02 INFO SecurityManager: SecurityManager:
>>> >>>>>>>>>> authentication disabled; ui acls disabled; users with view
>>> permissions:
>>> >>>>>>>>>> Set(root)
>>> >>>>>>>>>> 14/09/02 17:34:02 INFO Slf4jLogger: Slf4jLogger started
>>> >>>>>>>>>> 14/09/02 17:34:02 INFO Remoting: Starting remoting
>>> >>>>>>>>>> 14/09/02 17:34:03 INFO Remoting: Remoting started; listening
>>> on
>>> >>>>>>>>>> addresses :[akka.tcp://spark@HDOP-M.AGT:41059]
>>> >>>>>>>>>> 14/09/02 17:34:03 INFO Remoting: Remoting now listens on
>>> >>>>>>>>>> addresses: [akka.tcp://spark@HDOP-M.AGT:41059]
>>> >>>>>>>>>> 14/09/02 17:34:03 INFO SparkEnv: Registering MapOutputTracker
>>> >>>>>>>>>> 14/09/02 17:34:03 INFO SparkEnv: Registering
>>> BlockManagerMaster
>>> >>>>>>>>>> 14/09/02 17:34:03 INFO DiskBlockManager: Created local
>>> directory
>>> >>>>>>>>>> at /tmp/spark-local-20140902173403-cda8
>>> >>>>>>>>>> 14/09/02 17:34:03 INFO MemoryStore: MemoryStore started with
>>> >>>>>>>>>> capacity 294.9 MB.
>>> >>>>>>>>>> 14/09/02 17:34:03 INFO ConnectionManager: Bound socket to port
>>> >>>>>>>>>> 34931 with id = ConnectionManagerId(HDOP-M.AGT,34931)
>>> >>>>>>>>>> 14/09/02 17:34:03 INFO BlockManagerMaster: Trying to register
>>> >>>>>>>>>> BlockManager
>>> >>>>>>>>>> 14/09/02 17:34:03 INFO BlockManagerInfo: Registering block
>>> manager
>>> >>>>>>>>>> HDOP-M.AGT:34931 with 294.9 MB RAM
>>> >>>>>>>>>> 14/09/02 17:34:03 INFO BlockManagerMaster: Registered
>>> BlockManager
>>> >>>>>>>>>> 14/09/02 17:34:03 INFO HttpServer: Starting HTTP Server
>>> >>>>>>>>>> 14/09/02 17:34:03 INFO HttpBroadcast: Broadcast server
>>> started at
>>> >>>>>>>>>> http://10.193.1.71:54341
>>> >>>>>>>>>> 14/09/02 17:34:03 INFO HttpFileServer: HTTP File server
>>> directory
>>> >>>>>>>>>> is /tmp/spark-77c7a7dc-181e-4069-a014-8103a6a6330a
>>> >>>>>>>>>> 14/09/02 17:34:03 INFO HttpServer: Starting HTTP Server
>>> >>>>>>>>>> 14/09/02 17:34:04 INFO SparkUI: Started SparkUI at
>>> >>>>>>>>>> http://HDOP-M.AGT:4040
>>> >>>>>>>>>> 14/09/02 17:34:04 WARN NativeCodeLoader: Unable to load
>>> >>>>>>>>>> native-hadoop library for your platform... using builtin-java
>>> classes where
>>> >>>>>>>>>> applicable
>>> >>>>>>>>>> 14/09/02 17:34:04 INFO Utils: Copying
>>> >>>>>>>>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>> >>>>>>>>>> to /tmp/spark-f2e0cc0f-59cb-4f6c-9d48-f16205a40c7e/pi.py
>>> >>>>>>>>>> 14/09/02 17:34:04 INFO SparkContext: Added file
>>> >>>>>>>>>>
>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>> >>>>>>>>>> at http://10.193.1.71:52938/files/pi.py with timestamp
>>> 1409650444941
>>> >>>>>>>>>> 14/09/02 17:34:05 INFO AppClient$ClientActor: Connecting to
>>> master
>>> >>>>>>>>>> spark://10.193.1.71:7077...
>>> >>>>>>>>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not
>>> connect to
>>> >>>>>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>> >>>>>>>>>> akka.remote.EndpointAssociationException: Association failed
>>> with
>>> >>>>>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>> >>>>>>>>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not
>>> connect to
>>> >>>>>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>> >>>>>>>>>> akka.remote.EndpointAssociationException: Association failed
>>> with
>>> >>>>>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>> >>>>>>>>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not
>>> connect to
>>> >>>>>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>> >>>>>>>>>> akka.remote.EndpointAssociationException: Association failed
>>> with
>>> >>>>>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>> >>>>>>>>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not
>>> connect to
>>> >>>>>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>> >>>>>>>>>> akka.remote.EndpointAssociationException: Association failed
>>> with
>>> >>>>>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>> >>>>>>>>>> 14/09/02 17:34:25 INFO AppClient$ClientActor: Connecting to
>>> master
>>> >>>>>>>>>> spark://10.193.1.71:7077...
>>> >>>>>>>>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not
>>> connect to
>>> >>>>>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>> >>>>>>>>>> akka.remote.EndpointAssociationException: Association failed
>>> with
>>> >>>>>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>> >>>>>>>>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not
>>> connect to
>>> >>>>>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>> >>>>>>>>>> akka.remote.EndpointAssociationException: Association failed
>>> with
>>> >>>>>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>> >>>>>>>>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not
>>> connect to
>>> >>>>>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>> >>>>>>>>>> akka.remote.EndpointAssociationException: Association failed
>>> with
>>> >>>>>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>> >>>>>>>>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not
>>> connect to
>>> >>>>>>>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>> >>>>>>>>>> akka.remote.EndpointAssociationException: Association failed
>>> with
>>> >>>>>>>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>> >>>>>>>>>> Traceback (most recent call last):
>>> >>>>>>>>>>   File
>>> >>>>>>>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>>>>>>>> line 38, in <module>
>>> >>>>>>>>>>     count = sc.parallelize(xrange(1, n+1),
>>> >>>>>>>>>> slices).map(f).reduce(add)
>>> >>>>>>>>>>   File
>>> >>>>>>>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>>> >>>>>>>>>> line 271, in parallelize
>>> >>>>>>>>>>     jrdd = readRDDFromFile(self._jsc, tempFile.name,
>>> numSlices)
>>> >>>>>>>>>>   File
>>> >>>>>>>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>>> >>>>>>>>>> line 537, in __call__
>>> >>>>>>>>>>   File
>>> >>>>>>>>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>>> >>>>>>>>>> line 300, in get_return_value
>>> >>>>>>>>>> py4j.protocol.Py4JJavaError: An error occurred while calling
>>> >>>>>>>>>> z:org.apache.spark.api.python.PythonRDD.readRDDFromFile.
>>> >>>>>>>>>> : java.lang.OutOfMemoryError: GC overhead limit exceeded
>>> >>>>>>>>>> at
>>> >>>>>>>>>>
>>> org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279)
>>> >>>>>>>>>> at
>>> >>>>>>>>>>
>>> org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala)
>>> >>>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> >>>>>>>>>> at
>>> >>>>>>>>>>
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> >>>>>>>>>> at
>>> >>>>>>>>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> >>>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>> >>>>>>>>>> at
>>> py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>>> >>>>>>>>>> at
>>> >>>>>>>>>>
>>> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>> >>>>>>>>>> at py4j.Gateway.invoke(Gateway.java:259)
>>> >>>>>>>>>> at
>>> >>>>>>>>>>
>>> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>>> >>>>>>>>>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>> >>>>>>>>>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>> >>>>>>>>>> at java.lang.Thread.run(Thread.java:744)
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>> Question:
>>> >>>>>>>>>>     how can I know spark master and port? Where is it defined?
>>> >>>>>>>>>>
>>> >>>>>>>>>> Thanks
>>> >>>>>>>>>> Oleg.
>>> >>>>>>>>>
>>> >>>>>>>>>
>>> >>>>>>>>
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>
>>> >>>>
>>> >>>
>>> >>
>>> >
>>>
>>
>>
>

Mime
View raw message