spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Or <and...@databricks.com>
Subject Re: pyspark yarn got exception
Date Thu, 04 Sep 2014 18:12:23 GMT
That should go into `conf/spark-env.sh`. Let me know if that works.


2014-09-04 11:10 GMT-07:00 Oleg Ruchovets <oruchovets@gmail.com>:

> Ok , I'll do ,
>
>    Which script or configuration file should I use for the changes:
>
> export SPARK_YARN_USER_ENV=PYSPARK_PYTHON=/anaconda/bin/pyspark
> to make sure that the slaves also get the same PYSPARK_PYTHON.
>
> Thanks
> Oleg.
>
>
> On Fri, Sep 5, 2014 at 1:21 AM, Andrew Or <andrew@databricks.com> wrote:
>
>> You may also need to
>>
>> export SPARK_YARN_USER_ENV=PYSPARK_PYTHON=/anaconda/bin/pyspark
>>
>> to make sure that the slaves also get the same PYSPARK_PYTHON.
>>
>>
>> 2014-09-04 9:52 GMT-07:00 Davies Liu <davies@databricks.com>:
>>
>>> You can use PYSPARK_PYTHON to choose which version of python will be
>>> used in pyspark, such as:
>>>
>>> PYSPARK_PYTHON=/anaconda/bin/python  bin/pyspark
>>>
>>> On Thu, Sep 4, 2014 at 1:30 AM, Oleg Ruchovets <oruchovets@gmail.com>
>>> wrote:
>>> > Hi ,
>>> >
>>> >     I got what is the reason of the problem.
>>> > HDP Hortonworks uses python 2.6.6 for ambari installations and rest of
>>> the
>>> > stuff.
>>> > I can run the PySpark and it works fine , but I need to use Anaconda
>>> > distribution (for spark). When I installed Anaconda (python 2.7.7) i
>>> GOT THE
>>> > PROBLEM.
>>> >
>>> > Question: how can this be resolved? Is there an way to have 2 python
>>> > versions installed on one machine?
>>> >
>>> >
>>> > Thanks
>>> > Oleg.
>>> >
>>> >
>>> > On Thu, Sep 4, 2014 at 1:15 PM, Oleg Ruchovets <oruchovets@gmail.com>
>>> wrote:
>>> >>
>>> >> Hi Andrew.
>>> >>
>>> >> Problem still occur:
>>> >>
>>> >> all machines are using python 2.7:
>>> >>
>>> >> [root@HDOP-N2 conf]# python --version
>>> >> Python 2.7.7 :: Anaconda 2.0.1 (64-bit)
>>> >>
>>> >> Executing command from bin/pyspark:
>>> >>            [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563
>>> ]#
>>> >> bin/pyspark    --driver-memory 4g --executor-memory 2g
>>> --executor-cores 1
>>> >> examples/src/main/python/pi.py   1000
>>> >>
>>> >>
>>> >> Python 2.7.7 |Anaconda 2.0.1 (64-bit)| (default, Jun  2 2014,
>>> 12:34:02)
>>> >> [GCC 4.1.2 20080704 (Red Hat 4.1.2-54)] on linux2
>>> >> Type "help", "copyright", "credits" or "license" for more information.
>>> >> Anaconda is brought to you by Continuum Analytics.
>>> >> Please check out: http://continuum.io/thanks and https://binstar.org
>>> >> Traceback (most recent call last):
>>> >>   File
>>> >> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563
>>> /python/pyspark/shell.py",
>>> >> line 43, in <module>
>>> >>     sc = SparkContext(appName="PySparkShell", pyFiles=add_files)
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>>> >> line 94, in __init__
>>> >>     SparkContext._ensure_initialized(self, gateway=gateway)
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>>> >> line 190, in _ensure_initialized
>>> >>     SparkContext._gateway = gateway or launch_gateway()
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/java_gateway.py",
>>> >> line 51, in launch_gateway
>>> >>     gateway_port = int(proc.stdout.readline())
>>> >> ValueError: invalid literal for int() with base 10:
>>> >> '/usr/jdk64/jdk1.7.0_45/bin/java\n'
>>> >> >>>
>>> >>
>>> >>
>>> >>
>>> >> This log is from Yarn Spark execution:
>>> >>
>>> >>
>>> >> SLF4J: Class path contains multiple SLF4J bindings.
>>> >> SLF4J: Found binding in
>>> >>
>>> [jar:file:/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> >> SLF4J: Found binding in
>>> >>
>>> [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>> >> explanation.
>>> >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>> >> 14/09/04 12:53:19 INFO SecurityManager: Changing view acls to:
>>> yarn,root
>>> >> 14/09/04 12:53:19 INFO SecurityManager: SecurityManager:
>>> authentication
>>> >> disabled; ui acls disabled; users with view permissions: Set(yarn,
>>> root)
>>> >> 14/09/04 12:53:20 INFO Slf4jLogger: Slf4jLogger started
>>> >> 14/09/04 12:53:20 INFO Remoting: Starting remoting
>>> >> 14/09/04 12:53:20 INFO Remoting: Remoting started; listening on
>>> addresses
>>> >> :[akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619]
>>> >> 14/09/04 12:53:20 INFO Remoting: Remoting now listens on addresses:
>>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619]
>>> >> 14/09/04 12:53:20 INFO RMProxy: Connecting to ResourceManager at
>>> >> HDOP-N1.AGT/10.193.1.72:8030
>>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: ApplicationAttemptId:
>>> >> appattempt_1409805761292_0005_000001
>>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: Registering the
>>> ApplicationMaster
>>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: Waiting for Spark driver to
>>> be
>>> >> reachable.
>>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: Driver now available:
>>> >> HDOP-B.AGT:45747
>>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: Listen to driver:
>>> >> akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler
>>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: Allocating 3 executors.
>>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Will Allocate 3 executor
>>> >> containers, each with 2432 memory
>>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Container request (host:
>>> >> Any, priority: 1, capability: <memory:2432, vCores:1>
>>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Container request (host:
>>> >> Any, priority: 1, capability: <memory:2432, vCores:1>
>>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Container request (host:
>>> >> Any, priority: 1, capability: <memory:2432, vCores:1>
>>> >> 14/09/04 12:53:21 INFO AMRMClientImpl: Received new token for :
>>> >> HDOP-M.AGT:45454
>>> >> 14/09/04 12:53:21 INFO AMRMClientImpl: Received new token for :
>>> >> HDOP-N1.AGT:45454
>>> >> 14/09/04 12:53:21 INFO RackResolver: Resolved HDOP-N1.AGT to
>>> /default-rack
>>> >> 14/09/04 12:53:21 INFO RackResolver: Resolved HDOP-M.AGT to
>>> /default-rack
>>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Launching container
>>> >> container_1409805761292_0005_01_000003 for on host HDOP-N1.AGT
>>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Launching
>>> ExecutorRunnable.
>>> >> driverUrl: akka.tcp://spark@HDOP-B.AGT
>>> :45747/user/CoarseGrainedScheduler,
>>> >> executorHostname: HDOP-N1.AGT
>>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Launching container
>>> >> container_1409805761292_0005_01_000002 for on host HDOP-M.AGT
>>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Starting Executor Container
>>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Launching
>>> ExecutorRunnable.
>>> >> driverUrl: akka.tcp://spark@HDOP-B.AGT
>>> :45747/user/CoarseGrainedScheduler,
>>> >> executorHostname: HDOP-M.AGT
>>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Starting Executor Container
>>> >> 14/09/04 12:53:21 INFO ContainerManagementProtocolProxy:
>>> >> yarn.client.max-nodemanagers-proxies : 500
>>> >> 14/09/04 12:53:21 INFO ContainerManagementProtocolProxy:
>>> >> yarn.client.max-nodemanagers-proxies : 500
>>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Setting up
>>> ContainerLaunchContext
>>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Setting up
>>> ContainerLaunchContext
>>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Preparing Local resources
>>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Preparing Local resources
>>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Prepared Local resources
>>> >> Map(pi.py -> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020
>>> file:
>>> >> "/user/root/.sparkStaging/application_1409805761292_0005/pi.py" }
>>> size: 1317
>>> >> timestamp: 1409806397200 type: FILE visibility: PRIVATE,
>>> __spark__.jar ->
>>> >> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020 file:
>>> >>
>>> "/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar"
>>> >> } size: 121759562 timestamp: 1409806397057 type: FILE visibility:
>>> PRIVATE)
>>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Prepared Local resources
>>> >> Map(pi.py -> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020
>>> file:
>>> >> "/user/root/.sparkStaging/application_1409805761292_0005/pi.py" }
>>> size: 1317
>>> >> timestamp: 1409806397200 type: FILE visibility: PRIVATE,
>>> __spark__.jar ->
>>> >> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020 file:
>>> >>
>>> "/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar"
>>> >> } size: 121759562 timestamp: 1409806397057 type: FILE visibility:
>>> PRIVATE)
>>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Setting up executor with
>>> >> commands: List($JAVA_HOME/bin/java, -server,
>>> -XX:OnOutOfMemoryError='kill
>>> >> %p', -Xms2048m -Xmx2048m , -Djava.io.tmpdir=$PWD/tmp,
>>> >> -Dlog4j.configuration=log4j-spark-container.properties,
>>> >> org.apache.spark.executor.CoarseGrainedExecutorBackend,
>>> >> akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler, 1,
>>> >> HDOP-N1.AGT, 1, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
>>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Setting up executor with
>>> >> commands: List($JAVA_HOME/bin/java, -server,
>>> -XX:OnOutOfMemoryError='kill
>>> >> %p', -Xms2048m -Xmx2048m , -Djava.io.tmpdir=$PWD/tmp,
>>> >> -Dlog4j.configuration=log4j-spark-container.properties,
>>> >> org.apache.spark.executor.CoarseGrainedExecutorBackend,
>>> >> akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler, 2,
>>> >> HDOP-M.AGT, 1, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
>>> >> 14/09/04 12:53:22 INFO ContainerManagementProtocolProxy: Opening
>>> proxy :
>>> >> HDOP-N1.AGT:45454
>>> >> 14/09/04 12:53:22 INFO ContainerManagementProtocolProxy: Opening
>>> proxy :
>>> >> HDOP-M.AGT:45454
>>> >> 14/09/04 12:53:22 INFO AMRMClientImpl: Received new token for :
>>> >> HDOP-N4.AGT:45454
>>> >> 14/09/04 12:53:22 INFO RackResolver: Resolved HDOP-N4.AGT to
>>> /default-rack
>>> >> 14/09/04 12:53:22 INFO YarnAllocationHandler: Launching container
>>> >> container_1409805761292_0005_01_000004 for on host HDOP-N4.AGT
>>> >> 14/09/04 12:53:22 INFO YarnAllocationHandler: Launching
>>> ExecutorRunnable.
>>> >> driverUrl: akka.tcp://spark@HDOP-B.AGT
>>> :45747/user/CoarseGrainedScheduler,
>>> >> executorHostname: HDOP-N4.AGT
>>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Starting Executor Container
>>> >> 14/09/04 12:53:22 INFO ContainerManagementProtocolProxy:
>>> >> yarn.client.max-nodemanagers-proxies : 500
>>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Setting up
>>> ContainerLaunchContext
>>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Preparing Local resources
>>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Prepared Local resources
>>> >> Map(pi.py -> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020
>>> file:
>>> >> "/user/root/.sparkStaging/application_1409805761292_0005/pi.py" }
>>> size: 1317
>>> >> timestamp: 1409806397200 type: FILE visibility: PRIVATE,
>>> __spark__.jar ->
>>> >> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020 file:
>>> >>
>>> "/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar"
>>> >> } size: 121759562 timestamp: 1409806397057 type: FILE visibility:
>>> PRIVATE)
>>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Setting up executor with
>>> >> commands: List($JAVA_HOME/bin/java, -server,
>>> -XX:OnOutOfMemoryError='kill
>>> >> %p', -Xms2048m -Xmx2048m , -Djava.io.tmpdir=$PWD/tmp,
>>> >> -Dlog4j.configuration=log4j-spark-container.properties,
>>> >> org.apache.spark.executor.CoarseGrainedExecutorBackend,
>>> >> akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler, 3,
>>> >> HDOP-N4.AGT, 1, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
>>> >> 14/09/04 12:53:22 INFO ContainerManagementProtocolProxy: Opening
>>> proxy :
>>> >> HDOP-N4.AGT:45454
>>> >> 14/09/04 12:53:22 INFO ExecutorLauncher: All executors have launched.
>>> >> 14/09/04 12:53:22 INFO ExecutorLauncher: Started progress reporter
>>> thread
>>> >> - sleep time : 5000
>>> >> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>>> >> disconnected! Shutting down. Disassociated
>>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>>> >> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>>> >> disconnected! Shutting down. Disassociated
>>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>>> >> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>>> >> disconnected! Shutting down. Disassociated
>>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>>> >> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>>> >> disconnected! Shutting down. Disassociated
>>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>>> >> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>>> >> disconnected! Shutting down. Disassociated
>>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>>> >> 14/09/04 12:54:02 INFO ExecutorLauncher: finish ApplicationMaster with
>>> >> SUCCEEDED
>>> >> 14/09/04 12:54:02 INFO AMRMClientImpl: Waiting for application to be
>>> >> successfully unregistered.
>>> >> 14/09/04 12:54:02 INFO ExecutorLauncher: Exited
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> Exception still occur:
>>> >>
>>> >>
>>> >>
>>> >>   [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>>> >> ./bin/spark-submit --master yarn  --num-executors 3  --driver-memory
>>> 4g
>>> >> --executor-memory 2g --executor-cores 1
>>>  examples/src/main/python/pi.py
>>> >> 1000
>>> >> /usr/jdk64/jdk1.7.0_45/bin/java
>>> >>
>>> >>
>>> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar:/etc/hadoop/conf
>>> >> -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
>>> >> 14/09/04 12:53:11 INFO spark.SecurityManager: Changing view acls to:
>>> root
>>> >> 14/09/04 12:53:11 INFO spark.SecurityManager: SecurityManager:
>>> >> authentication disabled; ui acls disabled; users with view
>>> permissions:
>>> >> Set(root)
>>> >> 14/09/04 12:53:12 INFO slf4j.Slf4jLogger: Slf4jLogger started
>>> >> 14/09/04 12:53:12 INFO Remoting: Starting remoting
>>> >> 14/09/04 12:53:12 INFO Remoting: Remoting started; listening on
>>> addresses
>>> >> :[akka.tcp://spark@HDOP-B.AGT:45747]
>>> >> 14/09/04 12:53:12 INFO Remoting: Remoting now listens on addresses:
>>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>>> >> 14/09/04 12:53:12 INFO spark.SparkEnv: Registering MapOutputTracker
>>> >> 14/09/04 12:53:12 INFO spark.SparkEnv: Registering BlockManagerMaster
>>> >> 14/09/04 12:53:12 INFO storage.DiskBlockManager: Created local
>>> directory
>>> >> at /tmp/spark-local-20140904125312-c7ea
>>> >> 14/09/04 12:53:12 INFO storage.MemoryStore: MemoryStore started with
>>> >> capacity 2.3 GB.
>>> >> 14/09/04 12:53:12 INFO network.ConnectionManager: Bound socket to port
>>> >> 37363 with id = ConnectionManagerId(HDOP-B.AGT,37363)
>>> >> 14/09/04 12:53:12 INFO storage.BlockManagerMaster: Trying to register
>>> >> BlockManager
>>> >> 14/09/04 12:53:12 INFO storage.BlockManagerInfo: Registering block
>>> manager
>>> >> HDOP-B.AGT:37363 with 2.3 GB RAM
>>> >> 14/09/04 12:53:12 INFO storage.BlockManagerMaster: Registered
>>> BlockManager
>>> >> 14/09/04 12:53:12 INFO spark.HttpServer: Starting HTTP Server
>>> >> 14/09/04 12:53:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >> 14/09/04 12:53:12 INFO server.AbstractConnector: Started
>>> >> SocketConnector@0.0.0.0:33547
>>> >> 14/09/04 12:53:12 INFO broadcast.HttpBroadcast: Broadcast server
>>> started
>>> >> at http://10.193.1.76:33547
>>> >> 14/09/04 12:53:12 INFO spark.HttpFileServer: HTTP File server
>>> directory is
>>> >> /tmp/spark-054f4eda-b93b-47d3-87d5-c40e81fc1fe8
>>> >> 14/09/04 12:53:12 INFO spark.HttpServer: Starting HTTP Server
>>> >> 14/09/04 12:53:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >> 14/09/04 12:53:12 INFO server.AbstractConnector: Started
>>> >> SocketConnector@0.0.0.0:54594
>>> >> 14/09/04 12:53:13 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >> 14/09/04 12:53:13 INFO server.AbstractConnector: Started
>>> >> SelectChannelConnector@0.0.0.0:4040
>>> >> 14/09/04 12:53:13 INFO ui.SparkUI: Started SparkUI at
>>> >> http://HDOP-B.AGT:4040
>>> >> 14/09/04 12:53:13 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop
>>> >> library for your platform... using builtin-java classes where
>>> applicable
>>> >> --args is deprecated. Use --arg instead.
>>> >> 14/09/04 12:53:14 INFO client.RMProxy: Connecting to ResourceManager
>>> at
>>> >> HDOP-N1.AGT/10.193.1.72:8050
>>> >> 14/09/04 12:53:14 INFO yarn.Client: Got Cluster metric info from
>>> >> ApplicationsManager (ASM), number of NodeManagers: 6
>>> >> 14/09/04 12:53:14 INFO yarn.Client: Queue info ... queueName: default,
>>> >> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>>> >>       queueApplicationCount = 0, queueChildQueueCount = 0
>>> >> 14/09/04 12:53:14 INFO yarn.Client: Max mem capabililty of a single
>>> >> resource in this cluster 13824
>>> >> 14/09/04 12:53:14 INFO yarn.Client: Preparing Local resources
>>> >> 14/09/04 12:53:15 INFO yarn.Client: Uploading
>>> >>
>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>> >> to
>>> >>
>>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>> >> 14/09/04 12:53:17 INFO yarn.Client: Uploading
>>> >>
>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>> >> to
>>> >>
>>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409805761292_0005/pi.py
>>> >> 14/09/04 12:53:17 INFO yarn.Client: Setting up the launch environment
>>> >> 14/09/04 12:53:17 INFO yarn.Client: Setting up container launch
>>> context
>>> >> 14/09/04 12:53:17 INFO yarn.Client: Command for starting the Spark
>>> >> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
>>> >> -Djava.io.tmpdir=$PWD/tmp,
>>> >>
>>> -Dspark.tachyonStore.folderName=\"spark-2b59c845-3de2-4c3d-a352-1379ecade281\",
>>> >> -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
>>> >>
>>> -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
>>> >> -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
>>> >> -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\",
>>> >> -Dspark.fileserver.uri=\"http://10.193.1.76:54594\",
>>> >> -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"45747\",
>>> >> -Dspark.executor.cores=\"1\",
>>> >> -Dspark.httpBroadcast.uri=\"http://10.193.1.76:33547\",
>>> >> -Dlog4j.configuration=log4j-spark-container.properties,
>>> >> org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused,
>>> --jar ,
>>> >> null,  --args  'HDOP-B.AGT:45747' , --executor-memory, 2048,
>>> >> --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
>>> >> <LOG_DIR>/stderr)
>>> >> 14/09/04 12:53:17 INFO yarn.Client: Submitting application to ASM
>>> >> 14/09/04 12:53:17 INFO impl.YarnClientImpl: Submitted application
>>> >> application_1409805761292_0005
>>> >> 14/09/04 12:53:17 INFO cluster.YarnClientSchedulerBackend: Application
>>> >> report from ASM:
>>> >> appMasterRpcPort: -1
>>> >> appStartTime: 1409806397305
>>> >> yarnAppState: ACCEPTED
>>> >>
>>> >> 14/09/04 12:53:18 INFO cluster.YarnClientSchedulerBackend: Application
>>> >> report from ASM:
>>> >> appMasterRpcPort: -1
>>> >> appStartTime: 1409806397305
>>> >> yarnAppState: ACCEPTED
>>> >>
>>> >> 14/09/04 12:53:19 INFO cluster.YarnClientSchedulerBackend: Application
>>> >> report from ASM:
>>> >> appMasterRpcPort: -1
>>> >> appStartTime: 1409806397305
>>> >> yarnAppState: ACCEPTED
>>> >>
>>> >> 14/09/04 12:53:20 INFO cluster.YarnClientSchedulerBackend: Application
>>> >> report from ASM:
>>> >> appMasterRpcPort: -1
>>> >> appStartTime: 1409806397305
>>> >> yarnAppState: ACCEPTED
>>> >>
>>> >> 14/09/04 12:53:21 INFO cluster.YarnClientSchedulerBackend: Application
>>> >> report from ASM:
>>> >> appMasterRpcPort: 0
>>> >> appStartTime: 1409806397305
>>> >> yarnAppState: RUNNING
>>> >>
>>> >> 14/09/04 12:53:23 INFO cluster.YarnClientClusterScheduler:
>>> >> YarnClientClusterScheduler.postStartHook done
>>> >> 14/09/04 12:53:26 INFO cluster.YarnClientSchedulerBackend: Registered
>>> >> executor:
>>> >> Actor[akka.tcp://sparkExecutor@HDOP-N1.AGT:40024/user/Executor#
>>> 2065794895]
>>> >> with ID 1
>>> >> 14/09/04 12:53:26 INFO storage.BlockManagerInfo: Registering block
>>> manager
>>> >> HDOP-N1.AGT:34857 with 1178.1 MB RAM
>>> >> 14/09/04 12:53:26 INFO cluster.YarnClientSchedulerBackend: Registered
>>> >> executor:
>>> >> Actor[akka.tcp://sparkExecutor@HDOP-N4.AGT
>>> :49234/user/Executor#820272849]
>>> >> with ID 3
>>> >> 14/09/04 12:53:27 INFO cluster.YarnClientSchedulerBackend: Registered
>>> >> executor:
>>> >> Actor[akka.tcp://sparkExecutor@HDOP-M.AGT
>>> :38124/user/Executor#715249825]
>>> >> with ID 2
>>> >> 14/09/04 12:53:27 INFO storage.BlockManagerInfo: Registering block
>>> manager
>>> >> HDOP-N4.AGT:43365 with 1178.1 MB RAM
>>> >> 14/09/04 12:53:27 INFO storage.BlockManagerInfo: Registering block
>>> manager
>>> >> HDOP-M.AGT:45711 with 1178.1 MB RAM
>>> >> 14/09/04 12:53:55 INFO spark.SparkContext: Starting job: reduce at
>>> >>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Got job 0 (reduce at
>>> >>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>>> >> with 1000 output partitions (allowLocal=false)
>>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Final stage: Stage
>>> 0(reduce
>>> >> at
>>> >>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Parents of final stage:
>>> >> List()
>>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Missing parents: List()
>>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Submitting Stage 0
>>> >> (PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing
>>> parents
>>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Submitting 1000 missing
>>> >> tasks from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
>>> >> 14/09/04 12:53:55 INFO cluster.YarnClientClusterScheduler: Adding
>>> task set
>>> >> 0.0 with 1000 tasks
>>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:0
>>> as
>>> >> TID 0 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0 as
>>> >> 369810 bytes in 5 ms
>>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:1
>>> as
>>> >> TID 1 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1 as
>>> >> 506275 bytes in 2 ms
>>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:2
>>> as
>>> >> TID 2 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2 as
>>> >> 501135 bytes in 2 ms
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3
>>> as
>>> >> TID 3 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3 as
>>> >> 506275 bytes in 5 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 1 (task
>>> 0.0:1)
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>
>>> >> at
>>> >>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>> >> at
>>> >>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>> >> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>> >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>> >> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>> >> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>> >> at
>>> >>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >> at
>>> >>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >> at java.lang.Thread.run(Thread.java:744)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1
>>> as
>>> >> TID 4 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1 as
>>> >> 506275 bytes in 5 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 2 (task
>>> 0.0:2)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 1]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2
>>> as
>>> >> TID 5 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2 as
>>> >> 501135 bytes in 5 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 3 (task
>>> 0.0:3)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 2]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3
>>> as
>>> >> TID 6 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3 as
>>> >> 506275 bytes in 5 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 0 (task
>>> 0.0:0)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 3]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:0
>>> as
>>> >> TID 7 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0 as
>>> >> 369810 bytes in 4 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 5 (task
>>> 0.0:2)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 4]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2
>>> as
>>> >> TID 8 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2 as
>>> >> 501135 bytes in 3 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 4 (task
>>> 0.0:1)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 5]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1
>>> as
>>> >> TID 9 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1 as
>>> >> 506275 bytes in 4 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 6 (task
>>> 0.0:3)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 6]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3
>>> as
>>> >> TID 10 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3 as
>>> >> 506275 bytes in 3 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 7 (task
>>> 0.0:0)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 7]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:0
>>> as
>>> >> TID 11 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0 as
>>> >> 369810 bytes in 3 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 8 (task
>>> 0.0:2)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 8]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2
>>> as
>>> >> TID 12 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2 as
>>> >> 501135 bytes in 4 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 10 (task
>>> 0.0:3)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 9]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3
>>> as
>>> >> TID 13 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3 as
>>> >> 506275 bytes in 3 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 9 (task
>>> 0.0:1)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 10]
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1
>>> as
>>> >> TID 14 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1 as
>>> >> 506275 bytes in 4 ms
>>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 11 (task
>>> 0.0:0)
>>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 11]
>>> >> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Starting task 0.0:0
>>> as
>>> >> TID 15 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>>> >> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0 as
>>> >> 369810 bytes in 4 ms
>>> >> 14/09/04 12:53:57 WARN scheduler.TaskSetManager: Lost TID 12 (task
>>> 0.0:2)
>>> >> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 12]
>>> >> 14/09/04 12:53:57 ERROR scheduler.TaskSetManager: Task 0.0:2 failed 4
>>> >> times; aborting job
>>> >> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 13]
>>> >> 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Cancelling
>>> >> stage 0
>>> >> 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Stage 0 was
>>> >> cancelled
>>> >> 14/09/04 12:53:57 INFO scheduler.DAGScheduler: Failed to run reduce at
>>> >>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>>> >> Traceback (most recent call last):
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 38, in <module>
>>> >>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 619, in reduce
>>> >>     vals = self.mapPartitions(func).collect()
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 583, in collect
>>> >>     bytesInJava = self._jrdd.collect().iterator()
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>>> >> line 537, in __call__
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>>> >> line 300, in get_return_value
>>> >> py4j.protocol.Py4JJavaError14/09/04 12:53:57 INFO
>>> >> scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >> last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>  [duplicate 14]
>>> >> 14/09/04 12:53:57 WARN scheduler.TaskSetManager: Loss was due to
>>> >> org.apache.spark.TaskKilledException
>>> >> org.apache.spark.TaskKilledException
>>> >> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>>> >> at
>>> >>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >> at
>>> >>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >> at java.lang.Thread.run(Thread.java:744)
>>> >> : An error occurred while calling o24.collect.
>>> >> : org.apache.spark.SparkException: Job aborted due to stage failure:
>>> Task
>>> >> 0.0:2 failed 4 times, most recent failure: Exception failure in TID
>>> 12 on
>>> >> host HDOP-M.AGT: org.apache.spark.api.python.PythonException:
>>> Traceback
>>> >> (most recent call last):
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >> line 77, in main
>>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 191, in dump_stream
>>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 123, in dump_stream
>>> >>     for obj in iterator:
>>> >>   File
>>> >>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >> line 180, in _batched
>>> >>     for item in iterator:
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >> line 612, in func
>>> >>   File
>>> >>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >> line 36, in f
>>> >> SystemError: unknown opcode
>>> >>
>>> >>
>>> >>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>> >>
>>> >>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>> >>
>>>  org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>> >>
>>>  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> >>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> >>
>>> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>> >>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>>> >>
>>> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>> >>
>>> >>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>
>>> >>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>         java.lang.Thread.run(Thread.java:744)
>>> >> Driver stacktrace:
>>> >> at
>>> >> org.apache.spark.scheduler.DAGScheduler.org
>>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>> >> at
>>> >>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>> >> at
>>> >>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>> >> at
>>> >>
>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>> >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>> >> at
>>> >>
>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>> >> at
>>> >>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>> >> at
>>> >>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>> >> at scala.Option.foreach(Option.scala:236)
>>> >> at
>>> >>
>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>> >> at
>>> >>
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>> >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>> >> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>> >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>> >> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>> >> at
>>> >>
>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>> >> at
>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> >> at
>>> >>
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> >> at
>>> >>
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> >> at
>>> >>
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> >>
>>> >> 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Removed
>>> TaskSet
>>> >> 0.0, whose tasks have all completed, from pool
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> What other procedure can be done for fixing the problem.
>>> >>
>>> >>
>>> >> Thanks
>>> >>
>>> >> Oleg.
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Sep 4, 2014 at 5:36 AM, Andrew Or <andrew@databricks.com>
>>> wrote:
>>> >>>
>>> >>> Hi Oleg,
>>> >>>
>>> >>> Your configuration looks alright to me. I haven't seen an "unknown
>>> >>> opcode" System.error before in PySpark. This usually means you have
>>> >>> corrupted .pyc files lying around (ones that belonged to an old
>>> python
>>> >>> version, perhaps). What python version are you using? Are all your
>>> nodes
>>> >>> running the same version of python? What happens if you just run
>>> bin/pyspark
>>> >>> with the same command line arguments, and then do an
>>> >>> "sc.parallelize(range(10)).count()", does it still fail?
>>> >>>
>>> >>> Andrew
>>> >>>
>>> >>>
>>> >>> 2014-09-02 23:42 GMT-07:00 Oleg Ruchovets <oruchovets@gmail.com>:
>>> >>>>
>>> >>>> Hi I changed master to yarn but execution failed with exception
>>> again. I
>>> >>>> am using PySpark.
>>> >>>>
>>> >>>> [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>>> >>>> ./bin/spark-submit --master yarn  --num-executors 3
>>> --driver-memory 4g
>>> >>>> --executor-memory 2g --executor-cores 1
>>>  examples/src/main/python/pi.py
>>> >>>> 1000
>>> >>>> /usr/jdk64/jdk1.7.0_45/bin/java
>>> >>>>
>>> >>>>
>>> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar:/etc/hadoop/conf
>>> >>>> -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
>>> >>>> 14/09/03 14:35:11 INFO spark.SecurityManager: Changing view acls to:
>>> >>>> root
>>> >>>> 14/09/03 14:35:11 INFO spark.SecurityManager: SecurityManager:
>>> >>>> authentication disabled; ui acls disabled; users with view
>>> permissions:
>>> >>>> Set(root)
>>> >>>> 14/09/03 14:35:11 INFO slf4j.Slf4jLogger: Slf4jLogger started
>>> >>>> 14/09/03 14:35:11 INFO Remoting: Starting remoting
>>> >>>> 14/09/03 14:35:12 INFO Remoting: Remoting started; listening on
>>> >>>> addresses :[akka.tcp://spark@HDOP-B.AGT:51707]
>>> >>>> 14/09/03 14:35:12 INFO Remoting: Remoting now listens on addresses:
>>> >>>> [akka.tcp://spark@HDOP-B.AGT:51707]
>>> >>>> 14/09/03 14:35:12 INFO spark.SparkEnv: Registering MapOutputTracker
>>> >>>> 14/09/03 14:35:12 INFO spark.SparkEnv: Registering
>>> BlockManagerMaster
>>> >>>> 14/09/03 14:35:12 INFO storage.DiskBlockManager: Created local
>>> directory
>>> >>>> at /tmp/spark-local-20140903143512-5aab
>>> >>>> 14/09/03 14:35:12 INFO storage.MemoryStore: MemoryStore started with
>>> >>>> capacity 2.3 GB.
>>> >>>> 14/09/03 14:35:12 INFO network.ConnectionManager: Bound socket to
>>> port
>>> >>>> 53216 with id = ConnectionManagerId(HDOP-B.AGT,53216)
>>> >>>> 14/09/03 14:35:12 INFO storage.BlockManagerMaster: Trying to
>>> register
>>> >>>> BlockManager
>>> >>>> 14/09/03 14:35:12 INFO storage.BlockManagerInfo: Registering block
>>> >>>> manager HDOP-B.AGT:53216 with 2.3 GB RAM
>>> >>>> 14/09/03 14:35:12 INFO storage.BlockManagerMaster: Registered
>>> >>>> BlockManager
>>> >>>> 14/09/03 14:35:12 INFO spark.HttpServer: Starting HTTP Server
>>> >>>> 14/09/03 14:35:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >>>> 14/09/03 14:35:12 INFO server.AbstractConnector: Started
>>> >>>> SocketConnector@0.0.0.0:50624
>>> >>>> 14/09/03 14:35:12 INFO broadcast.HttpBroadcast: Broadcast server
>>> started
>>> >>>> at http://10.193.1.76:50624
>>> >>>> 14/09/03 14:35:12 INFO spark.HttpFileServer: HTTP File server
>>> directory
>>> >>>> is /tmp/spark-fd7fdcb2-f45d-430f-95fa-afbc4f329b43
>>> >>>> 14/09/03 14:35:12 INFO spark.HttpServer: Starting HTTP Server
>>> >>>> 14/09/03 14:35:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >>>> 14/09/03 14:35:12 INFO server.AbstractConnector: Started
>>> >>>> SocketConnector@0.0.0.0:41773
>>> >>>> 14/09/03 14:35:13 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >>>> 14/09/03 14:35:13 INFO server.AbstractConnector: Started
>>> >>>> SelectChannelConnector@0.0.0.0:4040
>>> >>>> 14/09/03 14:35:13 INFO ui.SparkUI: Started SparkUI at
>>> >>>> http://HDOP-B.AGT:4040
>>> >>>> 14/09/03 14:35:13 WARN util.NativeCodeLoader: Unable to load
>>> >>>> native-hadoop library for your platform... using builtin-java
>>> classes where
>>> >>>> applicable
>>> >>>> --args is deprecated. Use --arg instead.
>>> >>>> 14/09/03 14:35:14 INFO client.RMProxy: Connecting to
>>> ResourceManager at
>>> >>>> HDOP-N1.AGT/10.193.1.72:8050
>>> >>>> 14/09/03 14:35:14 INFO yarn.Client: Got Cluster metric info from
>>> >>>> ApplicationsManager (ASM), number of NodeManagers: 6
>>> >>>> 14/09/03 14:35:14 INFO yarn.Client: Queue info ... queueName:
>>> default,
>>> >>>> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>>> >>>>       queueApplicationCount = 0, queueChildQueueCount = 0
>>> >>>> 14/09/03 14:35:14 INFO yarn.Client: Max mem capabililty of a single
>>> >>>> resource in this cluster 13824
>>> >>>> 14/09/03 14:35:14 INFO yarn.Client: Preparing Local resources
>>> >>>> 14/09/03 14:35:14 INFO yarn.Client: Uploading
>>> >>>>
>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>> >>>> to
>>> >>>>
>>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0036/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>> >>>> 14/09/03 14:35:16 INFO yarn.Client: Uploading
>>> >>>>
>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>> >>>> to
>>> >>>>
>>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0036/pi.py
>>> >>>> 14/09/03 14:35:16 INFO yarn.Client: Setting up the launch
>>> environment
>>> >>>> 14/09/03 14:35:16 INFO yarn.Client: Setting up container launch
>>> context
>>> >>>> 14/09/03 14:35:16 INFO yarn.Client: Command for starting the Spark
>>> >>>> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
>>> >>>> -Djava.io.tmpdir=$PWD/tmp,
>>> >>>>
>>> -Dspark.tachyonStore.folderName=\"spark-98b7d323-2faf-419a-a88d-1a0c549dc5d4\",
>>> >>>> -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
>>> >>>>
>>> -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
>>> >>>> -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
>>> >>>> -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\",
>>> >>>> -Dspark.fileserver.uri=\"http://10.193.1.76:41773\",
>>> >>>> -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"51707\",
>>> >>>> -Dspark.executor.cores=\"1\",
>>> >>>> -Dspark.httpBroadcast.uri=\"http://10.193.1.76:50624\",
>>> >>>> -Dlog4j.configuration=log4j-spark-container.properties,
>>> >>>> org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused,
>>> --jar ,
>>> >>>> null,  --args  'HDOP-B.AGT:51707' , --executor-memory, 2048,
>>> >>>> --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
>>> >>>> <LOG_DIR>/stderr)
>>> >>>> 14/09/03 14:35:16 INFO yarn.Client: Submitting application to ASM
>>> >>>> 14/09/03 14:35:16 INFO impl.YarnClientImpl: Submitted application
>>> >>>> application_1409559972905_0036
>>> >>>> 14/09/03 14:35:16 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>> report from ASM:
>>> >>>> appMasterRpcPort: -1
>>> >>>> appStartTime: 1409726116517
>>> >>>> yarnAppState: ACCEPTED
>>> >>>>
>>> >>>> 14/09/03 14:35:17 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>> report from ASM:
>>> >>>> appMasterRpcPort: -1
>>> >>>> appStartTime: 1409726116517
>>> >>>> yarnAppState: ACCEPTED
>>> >>>>
>>> >>>> 14/09/03 14:35:18 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>> report from ASM:
>>> >>>> appMasterRpcPort: -1
>>> >>>> appStartTime: 1409726116517
>>> >>>> yarnAppState: ACCEPTED
>>> >>>>
>>> >>>> 14/09/03 14:35:19 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>> report from ASM:
>>> >>>> appMasterRpcPort: -1
>>> >>>> appStartTime: 1409726116517
>>> >>>> yarnAppState: ACCEPTED
>>> >>>>
>>> >>>> 14/09/03 14:35:20 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>> report from ASM:
>>> >>>> appMasterRpcPort: -1
>>> >>>> appStartTime: 1409726116517
>>> >>>> yarnAppState: ACCEPTED
>>> >>>>
>>> >>>> 14/09/03 14:35:21 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>> report from ASM:
>>> >>>> appMasterRpcPort: -1
>>> >>>> appStartTime: 1409726116517
>>> >>>> yarnAppState: ACCEPTED
>>> >>>>
>>> >>>> 14/09/03 14:35:22 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>> report from ASM:
>>> >>>> appMasterRpcPort: 0
>>> >>>> appStartTime: 1409726116517
>>> >>>> yarnAppState: RUNNING
>>> >>>>
>>> >>>> 14/09/03 14:35:24 INFO cluster.YarnClientClusterScheduler:
>>> >>>> YarnClientClusterScheduler.postStartHook done
>>> >>>> 14/09/03 14:35:25 INFO cluster.YarnClientSchedulerBackend:
>>> Registered
>>> >>>> executor:
>>> >>>> Actor[akka.tcp://sparkExecutor@HDOP-B.AGT
>>> :58976/user/Executor#-1831707618]
>>> >>>> with ID 1
>>> >>>> 14/09/03 14:35:26 INFO storage.BlockManagerInfo: Registering block
>>> >>>> manager HDOP-B.AGT:44142 with 1178.1 MB RAM
>>> >>>> 14/09/03 14:35:26 INFO cluster.YarnClientSchedulerBackend:
>>> Registered
>>> >>>> executor:
>>> >>>> Actor[akka.tcp://sparkExecutor@HDOP-N1.AGT
>>> :45140/user/Executor#875812337]
>>> >>>> with ID 2
>>> >>>> 14/09/03 14:35:26 INFO storage.BlockManagerInfo: Registering block
>>> >>>> manager HDOP-N1.AGT:48513 with 1178.1 MB RAM
>>> >>>> 14/09/03 14:35:26 INFO cluster.YarnClientSchedulerBackend:
>>> Registered
>>> >>>> executor:
>>> >>>> Actor[akka.tcp://sparkExecutor@HDOP-N3.AGT
>>> :45380/user/Executor#1559437246]
>>> >>>> with ID 3
>>> >>>> 14/09/03 14:35:27 INFO storage.BlockManagerInfo: Registering block
>>> >>>> manager HDOP-N3.AGT:46616 with 1178.1 MB RAM
>>> >>>> 14/09/03 14:35:56 INFO spark.SparkContext: Starting job: reduce at
>>> >>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Got job 0 (reduce at
>>> >>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>>> >>>> with 1000 output partitions (allowLocal=false)
>>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Final stage: Stage
>>> >>>> 0(reduce at
>>> >>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Parents of final
>>> stage:
>>> >>>> List()
>>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Missing parents:
>>> List()
>>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Submitting Stage 0
>>> >>>> (PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing
>>> parents
>>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Submitting 1000
>>> missing
>>> >>>> tasks from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
>>> >>>> 14/09/03 14:35:56 INFO cluster.YarnClientClusterScheduler: Adding
>>> task
>>> >>>> set 0.0 with 1000 tasks
>>> >>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:0 as
>>> >>>> TID 0 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0
>>> >>>> as 369811 bytes in 9 ms
>>> >>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:1 as
>>> >>>> TID 1 on executor 3: HDOP-N3.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1
>>> >>>> as 506276 bytes in 5 ms
>>> >>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:2 as
>>> >>>> TID 2 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2
>>> >>>> as 501136 bytes in 5 ms
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:3 as
>>> >>>> TID 3 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3
>>> >>>> as 506276 bytes in 4 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 2 (task
>>> 0.0:2)
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>
>>> >>>> at
>>> >>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>> >>>> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>> >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> >>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>> >>>> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>> >>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>> >>>> at
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>>> at
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>>> at java.lang.Thread.run(Thread.java:744)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:2 as
>>> >>>> TID 4 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2
>>> >>>> as 501136 bytes in 4 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 0 (task
>>> 0.0:0)
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>
>>> >>>> at
>>> >>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>> >>>> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>> >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> >>>> at
>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>> >>>> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>> >>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>> >>>> at
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>>> at
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>>> at java.lang.Thread.run(Thread.java:744)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:0 as
>>> >>>> TID 5 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0
>>> >>>> as 369811 bytes in 3 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 3 (task
>>> 0.0:3)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 1]
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:3 as
>>> >>>> TID 6 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3
>>> >>>> as 506276 bytes in 4 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 4 (task
>>> 0.0:2)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 1]
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:2 as
>>> >>>> TID 7 on executor 3: HDOP-N3.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2
>>> >>>> as 501136 bytes in 4 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 1 (task
>>> 0.0:1)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 2]
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:1 as
>>> >>>> TID 8 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1
>>> >>>> as 506276 bytes in 4 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 5 (task
>>> 0.0:0)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 3]
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:0 as
>>> >>>> TID 9 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0
>>> >>>> as 369811 bytes in 4 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 6 (task
>>> 0.0:3)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 2]
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:3 as
>>> >>>> TID 10 on executor 3: HDOP-N3.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:3
>>> >>>> as 506276 bytes in 4 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 7 (task
>>> 0.0:2)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 4]
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:2 as
>>> >>>> TID 11 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:2
>>> >>>> as 501136 bytes in 3 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 9 (task
>>> 0.0:0)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 3]
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:0 as
>>> >>>> TID 12 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:0
>>> >>>> as 369811 bytes in 4 ms
>>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 8 (task
>>> 0.0:1)
>>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 5]
>>> >>>> 14/09/03 14:35:58 INFO scheduler.TaskSetManager: Starting task
>>> 0.0:1 as
>>> >>>> TID 13 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>>> >>>> 14/09/03 14:35:58 INFO scheduler.TaskSetManager: Serialized task
>>> 0.0:1
>>> >>>> as 506276 bytes in 3 ms
>>> >>>> 14/09/03 14:35:58 WARN scheduler.TaskSetManager: Lost TID 11 (task
>>> >>>> 0.0:2)
>>> >>>> 14/09/03 14:35:58 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 4]
>>> >>>> 14/09/03 14:35:58 ERROR scheduler.TaskSetManager: Task 0.0:2 failed
>>> 4
>>> >>>> times; aborting job
>>> >>>> 14/09/03 14:35:58 INFO cluster.YarnClientClusterScheduler:
>>> Cancelling
>>> >>>> stage 0
>>> >>>> 14/09/03 14:35:58 INFO cluster.YarnClientClusterScheduler: Stage 0
>>> was
>>> >>>> cancelled
>>> >>>> 14/09/03 14:35:58 INFO scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 6]
>>> >>>> 14/09/03 14:35:58 INFO scheduler.DAGScheduler: Failed to run reduce
>>> at
>>> >>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>>> >>>> Traceback (most recent call last):
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 38, in <module>
>>> >>>>     count = sc.parallelize(xrange(1, n+1),
>>> slices).map(f).reduce(add)
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 619, in reduce
>>> >>>>     vals = self.mapPartitions(func).collect()
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 583, in collect
>>> >>>>     bytesInJava = self._jrdd.collect().iterator()
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>>> >>>> line 537, in __call__
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>>> >>>> line 300, in get_return_value
>>> >>>> py4j.protocol.Py4JJavaError14/09/03 14:35:58 INFO
>>> >>>> scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>>> call
>>> >>>> last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>  [duplicate 7]
>>> >>>> : An error occurred while calling o24.collect.
>>> >>>> : org.apache.spark.SparkException: Job aborted due to stage failure:
>>> >>>> Task 0.0:2 failed 4 times, most recent failure: Exception failure
>>> in TID 11
>>> >>>> on host HDOP-N1.AGT: org.apache.spark.api.python.PythonException:
>>> Traceback
>>> >>>> (most recent call last):
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>>> >>>> line 77, in main
>>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 191, in dump_stream
>>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 123, in dump_stream
>>> >>>>     for obj in iterator:
>>> >>>>   File
>>> >>>>
>>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>>> >>>> line 180, in _batched
>>> >>>>     for item in iterator:
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>>> >>>> line 612, in func
>>> >>>>   File
>>> >>>>
>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>> >>>> line 36, in f
>>> >>>> SystemError: unknown opcode
>>> >>>>
>>> >>>>
>>> >>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>>> >>>>
>>> >>>>
>>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>>> >>>>
>>> >>>> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>>> >>>>
>>>  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>> >>>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>> >>>>
>>> >>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>>> >>>>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>>> >>>>
>>> >>>>
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>> >>>>
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>>>
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>>>         java.lang.Thread.run(Thread.java:744)
>>> >>>> Driver stacktrace:
>>> >>>> at
>>> >>>> org.apache.spark.scheduler.DAGScheduler.org
>>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>>> >>>> at
>>> >>>>
>>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>>> >>>> at
>>> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>>> >>>> at scala.Option.foreach(Option.scala:236)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>>> >>>> at
>>> >>>>
>>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>>> >>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>> >>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>> >>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>> >>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>> >>>> at
>>> >>>>
>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>>> >>>> at
>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> >>>> at
>>> >>>>
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> >>>> at
>>> >>>>
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> >>>> at
>>> >>>>
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> >>>>
>>> >>>> 14/09/03 14:35:58 WARN scheduler.TaskSetManager: Loss was due to
>>> >>>> org.apache.spark.TaskKilledException
>>> >>>> org.apache.spark.TaskKilledException
>>> >>>> at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>>> >>>> at
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>> >>>> at
>>> >>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>> >>>> at java.lang.Thread.run(Thread.java:744)
>>> >>>> 14/09/03 14:35:58 INFO cluster.YarnClientClusterScheduler: Removed
>>> >>>> TaskSet 0.0, whose tasks have all completed, from pool
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Wed, Sep 3, 2014 at 1:53 PM, Oleg Ruchovets <
>>> oruchovets@gmail.com>
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> Hello Sandy , I changed to using yarn master but still got the
>>> >>>>> exceptions:
>>> >>>>>
>>> >>>>> What is the procedure to execute pyspark on yarn? is it required
>>> only
>>> >>>>> to attached the command , or it is required to start spark
>>> processes also?
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>>> >>>>> ./bin/spark-submit --master yarn://HDOP-N1.AGT:8032
>>> --num-executors 3
>>> >>>>> --driver-memory 4g --executor-memory 2g --executor-cores 1
>>> >>>>> examples/src/main/python/pi.py   1000
>>> >>>>> /usr/jdk64/jdk1.7.0_45/bin/java
>>> >>>>>
>>> >>>>>
>>> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar:/etc/hadoop/conf
>>> >>>>> -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
>>> >>>>> 14/09/03 13:48:48 INFO spark.SecurityManager: Changing view acls
>>> to:
>>> >>>>> root
>>> >>>>> 14/09/03 13:48:48 INFO spark.SecurityManager: SecurityManager:
>>> >>>>> authentication disabled; ui acls disabled; users with view
>>> permissions:
>>> >>>>> Set(root)
>>> >>>>> 14/09/03 13:48:49 INFO slf4j.Slf4jLogger: Slf4jLogger started
>>> >>>>> 14/09/03 13:48:49 INFO Remoting: Starting remoting
>>> >>>>> 14/09/03 13:48:49 INFO Remoting: Remoting started; listening on
>>> >>>>> addresses :[akka.tcp://spark@HDOP-B.AGT:34424]
>>> >>>>> 14/09/03 13:48:49 INFO Remoting: Remoting now listens on addresses:
>>> >>>>> [akka.tcp://spark@HDOP-B.AGT:34424]
>>> >>>>> 14/09/03 13:48:49 INFO spark.SparkEnv: Registering MapOutputTracker
>>> >>>>> 14/09/03 13:48:49 INFO spark.SparkEnv: Registering
>>> BlockManagerMaster
>>> >>>>> 14/09/03 13:48:49 INFO storage.DiskBlockManager: Created local
>>> >>>>> directory at /tmp/spark-local-20140903134849-231c
>>> >>>>> 14/09/03 13:48:49 INFO storage.MemoryStore: MemoryStore started
>>> with
>>> >>>>> capacity 2.3 GB.
>>> >>>>> 14/09/03 13:48:49 INFO network.ConnectionManager: Bound socket to
>>> port
>>> >>>>> 60647 with id = ConnectionManagerId(HDOP-B.AGT,60647)
>>> >>>>> 14/09/03 13:48:49 INFO storage.BlockManagerMaster: Trying to
>>> register
>>> >>>>> BlockManager
>>> >>>>> 14/09/03 13:48:49 INFO storage.BlockManagerInfo: Registering block
>>> >>>>> manager HDOP-B.AGT:60647 with 2.3 GB RAM
>>> >>>>> 14/09/03 13:48:49 INFO storage.BlockManagerMaster: Registered
>>> >>>>> BlockManager
>>> >>>>> 14/09/03 13:48:49 INFO spark.HttpServer: Starting HTTP Server
>>> >>>>> 14/09/03 13:48:49 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >>>>> 14/09/03 13:48:49 INFO server.AbstractConnector: Started
>>> >>>>> SocketConnector@0.0.0.0:56549
>>> >>>>> 14/09/03 13:48:49 INFO broadcast.HttpBroadcast: Broadcast server
>>> >>>>> started at http://10.193.1.76:56549
>>> >>>>> 14/09/03 13:48:49 INFO spark.HttpFileServer: HTTP File server
>>> directory
>>> >>>>> is /tmp/spark-90af1222-9ea8-4dd8-887a-343d09d44333
>>> >>>>> 14/09/03 13:48:49 INFO spark.HttpServer: Starting HTTP Server
>>> >>>>> 14/09/03 13:48:49 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >>>>> 14/09/03 13:48:49 INFO server.AbstractConnector: Started
>>> >>>>> SocketConnector@0.0.0.0:36512
>>> >>>>> 14/09/03 13:48:50 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> >>>>> 14/09/03 13:48:50 INFO server.AbstractConnector: Started
>>> >>>>> SelectChannelConnector@0.0.0.0:4040
>>> >>>>> 14/09/03 13:48:50 INFO ui.SparkUI: Started SparkUI at
>>> >>>>> http://HDOP-B.AGT:4040
>>> >>>>> 14/09/03 13:48:50 WARN util.NativeCodeLoader: Unable to load
>>> >>>>> native-hadoop library for your platform... using builtin-java
>>> classes where
>>> >>>>> applicable
>>> >>>>> --args is deprecated. Use --arg instead.
>>> >>>>> 14/09/03 13:48:51 INFO client.RMProxy: Connecting to
>>> ResourceManager at
>>> >>>>> HDOP-N1.AGT/10.193.1.72:8050
>>> >>>>> 14/09/03 13:48:51 INFO yarn.Client: Got Cluster metric info from
>>> >>>>> ApplicationsManager (ASM), number of NodeManagers: 6
>>> >>>>> 14/09/03 13:48:51 INFO yarn.Client: Queue info ... queueName:
>>> default,
>>> >>>>> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>>> >>>>>       queueApplicationCount = 0, queueChildQueueCount = 0
>>> >>>>> 14/09/03 13:48:51 INFO yarn.Client: Max mem capabililty of a single
>>> >>>>> resource in this cluster 13824
>>> >>>>> 14/09/03 13:48:51 INFO yarn.Client: Preparing Local resources
>>> >>>>> 14/09/03 13:48:51 INFO yarn.Client: Uploading
>>> >>>>>
>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>> >>>>> to
>>> >>>>>
>>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0033/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>>> >>>>> 14/09/03 13:48:53 INFO yarn.Client: Uploading
>>> >>>>>
>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>> >>>>> to
>>> >>>>>
>>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0033/pi.py
>>> >>>>> 14/09/03 13:48:53 INFO yarn.Client: Setting up the launch
>>> environment
>>> >>>>> 14/09/03 13:48:53 INFO yarn.Client: Setting up container launch
>>> context
>>> >>>>> 14/09/03 13:48:53 INFO yarn.Client: Command for starting the Spark
>>> >>>>> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
>>> >>>>> -Djava.io.tmpdir=$PWD/tmp,
>>> >>>>>
>>> -Dspark.tachyonStore.folderName=\"spark-bdabb882-a2e0-46b6-8e87-90cc6e359d84\",
>>> >>>>> -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
>>> >>>>>
>>> -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
>>> >>>>> -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
>>> >>>>> -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\",
>>> >>>>> -Dspark.fileserver.uri=\"http://10.193.1.76:36512\",
>>> >>>>> -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"34424\",
>>> >>>>> -Dspark.executor.cores=\"1\",
>>> >>>>> -Dspark.httpBroadcast.uri=\"http://10.193.1.76:56549\",
>>> >>>>> -Dlog4j.configuration=log4j-spark-container.properties,
>>> >>>>> org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused,
>>> --jar ,
>>> >>>>> null,  --args  'HDOP-B.AGT:34424' , --executor-memory, 2048,
>>> >>>>> --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
>>> >>>>> <LOG_DIR>/stderr)
>>> >>>>> 14/09/03 13:48:53 INFO yarn.Client: Submitting application to ASM
>>> >>>>> 14/09/03 13:48:53 INFO impl.YarnClientImpl: Submitted application
>>> >>>>> application_1409559972905_0033
>>> >>>>> 14/09/03 13:48:53 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>>> report from ASM:
>>> >>>>> appMasterRpcPort: -1
>>> >>>>> appStartTime: 1409723333584
>>> >>>>> yarnAppState: ACCEPTED
>>> >>>>>
>>> >>>>> 14/09/03 13:48:54 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>>> report from ASM:
>>> >>>>> appMasterRpcPort: -1
>>> >>>>> appStartTime: 1409723333584
>>> >>>>> yarnAppState: ACCEPTED
>>> >>>>>
>>> >>>>> 14/09/03 13:48:55 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>>> report from ASM:
>>> >>>>> appMasterRpcPort: -1
>>> >>>>> appStartTime: 1409723333584
>>> >>>>> yarnAppState: ACCEPTED
>>> >>>>>
>>> >>>>> 14/09/03 13:48:56 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>>> report from ASM:
>>> >>>>> appMasterRpcPort: -1
>>> >>>>> appStartTime: 1409723333584
>>> >>>>> yarnAppState: ACCEPTED
>>> >>>>>
>>> >>>>> 14/09/03 13:48:57 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>>> report from ASM:
>>> >>>>> appMasterRpcPort: -1
>>> >>>>> appStartTime: 1409723333584
>>> >>>>> yarnAppState: ACCEPTED
>>> >>>>>
>>> >>>>> 14/09/03 13:48:58 INFO cluster.YarnClientSchedulerBackend:
>>> Application
>>> >>>>> report from ASM:
>>> >>>>> appMasterRpcPort: 0
>>> >>>>> appStartTime: 1409723333584
>>> >>>>> yarnAppState: RUNNING
>>> >>>>>
>>> >>>>> 14/09/03 13:49:00 INFO cluster.YarnClientClusterScheduler:
>>> >>>>> YarnClientClusterScheduler.postStartHook done
>>> >>>>> 14/09/03 13:49:01 INFO cluster.YarnClientSchedulerBackend:
>>> Registered
>>> >>>>> executor:
>>> >>>>> Actor[akka.tcp://sparkExecutor@HDOP-B.AGT
>>> :57078/user/Executor#1595833626]
>>> >>>>> with ID 1
>>> >>>>> 14/09/03 13:49:02 INFO storage.BlockManagerInfo: Registering block
>>> >>>>> manager HDOP-B.AGT:54579 with 1178.1 MB RAM
>>> >>>>> 14/09/03 13:49:03 INFO cluster.YarnClientSchedulerBackend:
>>> Registered
>>> >>>>> executor:
>>> >>>>> Actor[akka.tcp://sparkExecutor@HDOP-N4.AGT
>>> :43121/user/Executor#-1266627304]
>>> >>>>> with ID 2
>>> >>>>> 14/09/03 13:49:03 INFO cluster.YarnClientSchedulerBackend:
>>> Registered
>>> >>>>> executor:
>>> >>>>> Actor[akka.tcp://sparkExecutor@HDOP-N2.AGT
>>> :36952/user/Executor#1003961369]
>>> >>>>> with ID 3
>>> >>>>> 14/09/03 13:49:04 INFO storage.BlockManagerInfo: Registering block
>>> >>>>> manager HDOP-N4.AGT:56891 with 1178.1 MB RAM
>>> >>>>> 14/09/03 13:49:04 INFO storage.BlockManagerInfo: Registering block
>>> >>>>> manager HDOP-N2.AGT:42381 with 1178.1 MB RAM
>>> >>>>> 14/09/03 13:49:33 INFO spark.SparkContext: Starting job: reduce at
>>> >>>>>
>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>>> >>>>> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Got job 0 (reduce at
>>> >>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2....
>>>
>>> [Message tronqué]
>>
>>
>>
>

Mime
View raw message