spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Ruchovets <oruchov...@gmail.com>
Subject Re: pyspark yarn got exception
Date Thu, 04 Sep 2014 18:10:30 GMT
Ok , I'll do ,

   Which script or configuration file should I use for the changes:

export SPARK_YARN_USER_ENV=PYSPARK_PYTHON=/anaconda/bin/pyspark
to make sure that the slaves also get the same PYSPARK_PYTHON.

Thanks
Oleg.


On Fri, Sep 5, 2014 at 1:21 AM, Andrew Or <andrew@databricks.com> wrote:

> You may also need to
>
> export SPARK_YARN_USER_ENV=PYSPARK_PYTHON=/anaconda/bin/pyspark
>
> to make sure that the slaves also get the same PYSPARK_PYTHON.
>
>
> 2014-09-04 9:52 GMT-07:00 Davies Liu <davies@databricks.com>:
>
>> You can use PYSPARK_PYTHON to choose which version of python will be
>> used in pyspark, such as:
>>
>> PYSPARK_PYTHON=/anaconda/bin/python  bin/pyspark
>>
>> On Thu, Sep 4, 2014 at 1:30 AM, Oleg Ruchovets <oruchovets@gmail.com>
>> wrote:
>> > Hi ,
>> >
>> >     I got what is the reason of the problem.
>> > HDP Hortonworks uses python 2.6.6 for ambari installations and rest of
>> the
>> > stuff.
>> > I can run the PySpark and it works fine , but I need to use Anaconda
>> > distribution (for spark). When I installed Anaconda (python 2.7.7) i
>> GOT THE
>> > PROBLEM.
>> >
>> > Question: how can this be resolved? Is there an way to have 2 python
>> > versions installed on one machine?
>> >
>> >
>> > Thanks
>> > Oleg.
>> >
>> >
>> > On Thu, Sep 4, 2014 at 1:15 PM, Oleg Ruchovets <oruchovets@gmail.com>
>> wrote:
>> >>
>> >> Hi Andrew.
>> >>
>> >> Problem still occur:
>> >>
>> >> all machines are using python 2.7:
>> >>
>> >> [root@HDOP-N2 conf]# python --version
>> >> Python 2.7.7 :: Anaconda 2.0.1 (64-bit)
>> >>
>> >> Executing command from bin/pyspark:
>> >>            [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563
>> ]#
>> >> bin/pyspark    --driver-memory 4g --executor-memory 2g
>> --executor-cores 1
>> >> examples/src/main/python/pi.py   1000
>> >>
>> >>
>> >> Python 2.7.7 |Anaconda 2.0.1 (64-bit)| (default, Jun  2 2014, 12:34:02)
>> >> [GCC 4.1.2 20080704 (Red Hat 4.1.2-54)] on linux2
>> >> Type "help", "copyright", "credits" or "license" for more information.
>> >> Anaconda is brought to you by Continuum Analytics.
>> >> Please check out: http://continuum.io/thanks and https://binstar.org
>> >> Traceback (most recent call last):
>> >>   File
>> >> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563
>> /python/pyspark/shell.py",
>> >> line 43, in <module>
>> >>     sc = SparkContext(appName="PySparkShell", pyFiles=add_files)
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>> >> line 94, in __init__
>> >>     SparkContext._ensure_initialized(self, gateway=gateway)
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>> >> line 190, in _ensure_initialized
>> >>     SparkContext._gateway = gateway or launch_gateway()
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/java_gateway.py",
>> >> line 51, in launch_gateway
>> >>     gateway_port = int(proc.stdout.readline())
>> >> ValueError: invalid literal for int() with base 10:
>> >> '/usr/jdk64/jdk1.7.0_45/bin/java\n'
>> >> >>>
>> >>
>> >>
>> >>
>> >> This log is from Yarn Spark execution:
>> >>
>> >>
>> >> SLF4J: Class path contains multiple SLF4J bindings.
>> >> SLF4J: Found binding in
>> >>
>> [jar:file:/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> >> SLF4J: Found binding in
>> >>
>> [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> >> explanation.
>> >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> >> 14/09/04 12:53:19 INFO SecurityManager: Changing view acls to:
>> yarn,root
>> >> 14/09/04 12:53:19 INFO SecurityManager: SecurityManager: authentication
>> >> disabled; ui acls disabled; users with view permissions: Set(yarn,
>> root)
>> >> 14/09/04 12:53:20 INFO Slf4jLogger: Slf4jLogger started
>> >> 14/09/04 12:53:20 INFO Remoting: Starting remoting
>> >> 14/09/04 12:53:20 INFO Remoting: Remoting started; listening on
>> addresses
>> >> :[akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619]
>> >> 14/09/04 12:53:20 INFO Remoting: Remoting now listens on addresses:
>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619]
>> >> 14/09/04 12:53:20 INFO RMProxy: Connecting to ResourceManager at
>> >> HDOP-N1.AGT/10.193.1.72:8030
>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: ApplicationAttemptId:
>> >> appattempt_1409805761292_0005_000001
>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: Registering the
>> ApplicationMaster
>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: Waiting for Spark driver to be
>> >> reachable.
>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: Driver now available:
>> >> HDOP-B.AGT:45747
>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: Listen to driver:
>> >> akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler
>> >> 14/09/04 12:53:21 INFO ExecutorLauncher: Allocating 3 executors.
>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Will Allocate 3 executor
>> >> containers, each with 2432 memory
>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Container request (host:
>> >> Any, priority: 1, capability: <memory:2432, vCores:1>
>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Container request (host:
>> >> Any, priority: 1, capability: <memory:2432, vCores:1>
>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Container request (host:
>> >> Any, priority: 1, capability: <memory:2432, vCores:1>
>> >> 14/09/04 12:53:21 INFO AMRMClientImpl: Received new token for :
>> >> HDOP-M.AGT:45454
>> >> 14/09/04 12:53:21 INFO AMRMClientImpl: Received new token for :
>> >> HDOP-N1.AGT:45454
>> >> 14/09/04 12:53:21 INFO RackResolver: Resolved HDOP-N1.AGT to
>> /default-rack
>> >> 14/09/04 12:53:21 INFO RackResolver: Resolved HDOP-M.AGT to
>> /default-rack
>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Launching container
>> >> container_1409805761292_0005_01_000003 for on host HDOP-N1.AGT
>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Launching
>> ExecutorRunnable.
>> >> driverUrl: akka.tcp://spark@HDOP-B.AGT
>> :45747/user/CoarseGrainedScheduler,
>> >> executorHostname: HDOP-N1.AGT
>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Launching container
>> >> container_1409805761292_0005_01_000002 for on host HDOP-M.AGT
>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Starting Executor Container
>> >> 14/09/04 12:53:21 INFO YarnAllocationHandler: Launching
>> ExecutorRunnable.
>> >> driverUrl: akka.tcp://spark@HDOP-B.AGT
>> :45747/user/CoarseGrainedScheduler,
>> >> executorHostname: HDOP-M.AGT
>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Starting Executor Container
>> >> 14/09/04 12:53:21 INFO ContainerManagementProtocolProxy:
>> >> yarn.client.max-nodemanagers-proxies : 500
>> >> 14/09/04 12:53:21 INFO ContainerManagementProtocolProxy:
>> >> yarn.client.max-nodemanagers-proxies : 500
>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Setting up
>> ContainerLaunchContext
>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Setting up
>> ContainerLaunchContext
>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Preparing Local resources
>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Preparing Local resources
>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Prepared Local resources
>> >> Map(pi.py -> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020
>> file:
>> >> "/user/root/.sparkStaging/application_1409805761292_0005/pi.py" }
>> size: 1317
>> >> timestamp: 1409806397200 type: FILE visibility: PRIVATE, __spark__.jar
>> ->
>> >> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020 file:
>> >>
>> "/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar"
>> >> } size: 121759562 timestamp: 1409806397057 type: FILE visibility:
>> PRIVATE)
>> >> 14/09/04 12:53:21 INFO ExecutorRunnable: Prepared Local resources
>> >> Map(pi.py -> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020
>> file:
>> >> "/user/root/.sparkStaging/application_1409805761292_0005/pi.py" }
>> size: 1317
>> >> timestamp: 1409806397200 type: FILE visibility: PRIVATE, __spark__.jar
>> ->
>> >> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020 file:
>> >>
>> "/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar"
>> >> } size: 121759562 timestamp: 1409806397057 type: FILE visibility:
>> PRIVATE)
>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Setting up executor with
>> >> commands: List($JAVA_HOME/bin/java, -server,
>> -XX:OnOutOfMemoryError='kill
>> >> %p', -Xms2048m -Xmx2048m , -Djava.io.tmpdir=$PWD/tmp,
>> >> -Dlog4j.configuration=log4j-spark-container.properties,
>> >> org.apache.spark.executor.CoarseGrainedExecutorBackend,
>> >> akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler, 1,
>> >> HDOP-N1.AGT, 1, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Setting up executor with
>> >> commands: List($JAVA_HOME/bin/java, -server,
>> -XX:OnOutOfMemoryError='kill
>> >> %p', -Xms2048m -Xmx2048m , -Djava.io.tmpdir=$PWD/tmp,
>> >> -Dlog4j.configuration=log4j-spark-container.properties,
>> >> org.apache.spark.executor.CoarseGrainedExecutorBackend,
>> >> akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler, 2,
>> >> HDOP-M.AGT, 1, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
>> >> 14/09/04 12:53:22 INFO ContainerManagementProtocolProxy: Opening proxy
>> :
>> >> HDOP-N1.AGT:45454
>> >> 14/09/04 12:53:22 INFO ContainerManagementProtocolProxy: Opening proxy
>> :
>> >> HDOP-M.AGT:45454
>> >> 14/09/04 12:53:22 INFO AMRMClientImpl: Received new token for :
>> >> HDOP-N4.AGT:45454
>> >> 14/09/04 12:53:22 INFO RackResolver: Resolved HDOP-N4.AGT to
>> /default-rack
>> >> 14/09/04 12:53:22 INFO YarnAllocationHandler: Launching container
>> >> container_1409805761292_0005_01_000004 for on host HDOP-N4.AGT
>> >> 14/09/04 12:53:22 INFO YarnAllocationHandler: Launching
>> ExecutorRunnable.
>> >> driverUrl: akka.tcp://spark@HDOP-B.AGT
>> :45747/user/CoarseGrainedScheduler,
>> >> executorHostname: HDOP-N4.AGT
>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Starting Executor Container
>> >> 14/09/04 12:53:22 INFO ContainerManagementProtocolProxy:
>> >> yarn.client.max-nodemanagers-proxies : 500
>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Setting up
>> ContainerLaunchContext
>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Preparing Local resources
>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Prepared Local resources
>> >> Map(pi.py -> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020
>> file:
>> >> "/user/root/.sparkStaging/application_1409805761292_0005/pi.py" }
>> size: 1317
>> >> timestamp: 1409806397200 type: FILE visibility: PRIVATE, __spark__.jar
>> ->
>> >> resource { scheme: "hdfs" host: "HDOP-B.AGT" port: 8020 file:
>> >>
>> "/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar"
>> >> } size: 121759562 timestamp: 1409806397057 type: FILE visibility:
>> PRIVATE)
>> >> 14/09/04 12:53:22 INFO ExecutorRunnable: Setting up executor with
>> >> commands: List($JAVA_HOME/bin/java, -server,
>> -XX:OnOutOfMemoryError='kill
>> >> %p', -Xms2048m -Xmx2048m , -Djava.io.tmpdir=$PWD/tmp,
>> >> -Dlog4j.configuration=log4j-spark-container.properties,
>> >> org.apache.spark.executor.CoarseGrainedExecutorBackend,
>> >> akka.tcp://spark@HDOP-B.AGT:45747/user/CoarseGrainedScheduler, 3,
>> >> HDOP-N4.AGT, 1, 1>, <LOG_DIR>/stdout, 2>, <LOG_DIR>/stderr)
>> >> 14/09/04 12:53:22 INFO ContainerManagementProtocolProxy: Opening proxy
>> :
>> >> HDOP-N4.AGT:45454
>> >> 14/09/04 12:53:22 INFO ExecutorLauncher: All executors have launched.
>> >> 14/09/04 12:53:22 INFO ExecutorLauncher: Started progress reporter
>> thread
>> >> - sleep time : 5000
>> >> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>> >> disconnected! Shutting down. Disassociated
>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>> >> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>> >> disconnected! Shutting down. Disassociated
>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>> >> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>> >> disconnected! Shutting down. Disassociated
>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>> >> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>> >> disconnected! Shutting down. Disassociated
>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>> >> 14/09/04 12:53:57 INFO ExecutorLauncher: Driver terminated or
>> >> disconnected! Shutting down. Disassociated
>> >> [akka.tcp://sparkYarnAM@HDOP-N2.AGT:46619] ->
>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>> >> 14/09/04 12:54:02 INFO ExecutorLauncher: finish ApplicationMaster with
>> >> SUCCEEDED
>> >> 14/09/04 12:54:02 INFO AMRMClientImpl: Waiting for application to be
>> >> successfully unregistered.
>> >> 14/09/04 12:54:02 INFO ExecutorLauncher: Exited
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Exception still occur:
>> >>
>> >>
>> >>
>> >>   [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>> >> ./bin/spark-submit --master yarn  --num-executors 3  --driver-memory 4g
>> >> --executor-memory 2g --executor-cores 1
>>  examples/src/main/python/pi.py
>> >> 1000
>> >> /usr/jdk64/jdk1.7.0_45/bin/java
>> >>
>> >>
>> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar:/etc/hadoop/conf
>> >> -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
>> >> 14/09/04 12:53:11 INFO spark.SecurityManager: Changing view acls to:
>> root
>> >> 14/09/04 12:53:11 INFO spark.SecurityManager: SecurityManager:
>> >> authentication disabled; ui acls disabled; users with view permissions:
>> >> Set(root)
>> >> 14/09/04 12:53:12 INFO slf4j.Slf4jLogger: Slf4jLogger started
>> >> 14/09/04 12:53:12 INFO Remoting: Starting remoting
>> >> 14/09/04 12:53:12 INFO Remoting: Remoting started; listening on
>> addresses
>> >> :[akka.tcp://spark@HDOP-B.AGT:45747]
>> >> 14/09/04 12:53:12 INFO Remoting: Remoting now listens on addresses:
>> >> [akka.tcp://spark@HDOP-B.AGT:45747]
>> >> 14/09/04 12:53:12 INFO spark.SparkEnv: Registering MapOutputTracker
>> >> 14/09/04 12:53:12 INFO spark.SparkEnv: Registering BlockManagerMaster
>> >> 14/09/04 12:53:12 INFO storage.DiskBlockManager: Created local
>> directory
>> >> at /tmp/spark-local-20140904125312-c7ea
>> >> 14/09/04 12:53:12 INFO storage.MemoryStore: MemoryStore started with
>> >> capacity 2.3 GB.
>> >> 14/09/04 12:53:12 INFO network.ConnectionManager: Bound socket to port
>> >> 37363 with id = ConnectionManagerId(HDOP-B.AGT,37363)
>> >> 14/09/04 12:53:12 INFO storage.BlockManagerMaster: Trying to register
>> >> BlockManager
>> >> 14/09/04 12:53:12 INFO storage.BlockManagerInfo: Registering block
>> manager
>> >> HDOP-B.AGT:37363 with 2.3 GB RAM
>> >> 14/09/04 12:53:12 INFO storage.BlockManagerMaster: Registered
>> BlockManager
>> >> 14/09/04 12:53:12 INFO spark.HttpServer: Starting HTTP Server
>> >> 14/09/04 12:53:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
>> >> 14/09/04 12:53:12 INFO server.AbstractConnector: Started
>> >> SocketConnector@0.0.0.0:33547
>> >> 14/09/04 12:53:12 INFO broadcast.HttpBroadcast: Broadcast server
>> started
>> >> at http://10.193.1.76:33547
>> >> 14/09/04 12:53:12 INFO spark.HttpFileServer: HTTP File server
>> directory is
>> >> /tmp/spark-054f4eda-b93b-47d3-87d5-c40e81fc1fe8
>> >> 14/09/04 12:53:12 INFO spark.HttpServer: Starting HTTP Server
>> >> 14/09/04 12:53:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
>> >> 14/09/04 12:53:12 INFO server.AbstractConnector: Started
>> >> SocketConnector@0.0.0.0:54594
>> >> 14/09/04 12:53:13 INFO server.Server: jetty-8.y.z-SNAPSHOT
>> >> 14/09/04 12:53:13 INFO server.AbstractConnector: Started
>> >> SelectChannelConnector@0.0.0.0:4040
>> >> 14/09/04 12:53:13 INFO ui.SparkUI: Started SparkUI at
>> >> http://HDOP-B.AGT:4040
>> >> 14/09/04 12:53:13 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop
>> >> library for your platform... using builtin-java classes where
>> applicable
>> >> --args is deprecated. Use --arg instead.
>> >> 14/09/04 12:53:14 INFO client.RMProxy: Connecting to ResourceManager at
>> >> HDOP-N1.AGT/10.193.1.72:8050
>> >> 14/09/04 12:53:14 INFO yarn.Client: Got Cluster metric info from
>> >> ApplicationsManager (ASM), number of NodeManagers: 6
>> >> 14/09/04 12:53:14 INFO yarn.Client: Queue info ... queueName: default,
>> >> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>> >>       queueApplicationCount = 0, queueChildQueueCount = 0
>> >> 14/09/04 12:53:14 INFO yarn.Client: Max mem capabililty of a single
>> >> resource in this cluster 13824
>> >> 14/09/04 12:53:14 INFO yarn.Client: Preparing Local resources
>> >> 14/09/04 12:53:15 INFO yarn.Client: Uploading
>> >>
>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>> >> to
>> >>
>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409805761292_0005/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>> >> 14/09/04 12:53:17 INFO yarn.Client: Uploading
>> >>
>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>> >> to
>> >>
>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409805761292_0005/pi.py
>> >> 14/09/04 12:53:17 INFO yarn.Client: Setting up the launch environment
>> >> 14/09/04 12:53:17 INFO yarn.Client: Setting up container launch context
>> >> 14/09/04 12:53:17 INFO yarn.Client: Command for starting the Spark
>> >> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
>> >> -Djava.io.tmpdir=$PWD/tmp,
>> >>
>> -Dspark.tachyonStore.folderName=\"spark-2b59c845-3de2-4c3d-a352-1379ecade281\",
>> >> -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
>> >>
>> -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
>> >> -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
>> >> -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\",
>> >> -Dspark.fileserver.uri=\"http://10.193.1.76:54594\",
>> >> -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"45747\",
>> >> -Dspark.executor.cores=\"1\",
>> >> -Dspark.httpBroadcast.uri=\"http://10.193.1.76:33547\",
>> >> -Dlog4j.configuration=log4j-spark-container.properties,
>> >> org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused, --jar
>> ,
>> >> null,  --args  'HDOP-B.AGT:45747' , --executor-memory, 2048,
>> >> --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
>> >> <LOG_DIR>/stderr)
>> >> 14/09/04 12:53:17 INFO yarn.Client: Submitting application to ASM
>> >> 14/09/04 12:53:17 INFO impl.YarnClientImpl: Submitted application
>> >> application_1409805761292_0005
>> >> 14/09/04 12:53:17 INFO cluster.YarnClientSchedulerBackend: Application
>> >> report from ASM:
>> >> appMasterRpcPort: -1
>> >> appStartTime: 1409806397305
>> >> yarnAppState: ACCEPTED
>> >>
>> >> 14/09/04 12:53:18 INFO cluster.YarnClientSchedulerBackend: Application
>> >> report from ASM:
>> >> appMasterRpcPort: -1
>> >> appStartTime: 1409806397305
>> >> yarnAppState: ACCEPTED
>> >>
>> >> 14/09/04 12:53:19 INFO cluster.YarnClientSchedulerBackend: Application
>> >> report from ASM:
>> >> appMasterRpcPort: -1
>> >> appStartTime: 1409806397305
>> >> yarnAppState: ACCEPTED
>> >>
>> >> 14/09/04 12:53:20 INFO cluster.YarnClientSchedulerBackend: Application
>> >> report from ASM:
>> >> appMasterRpcPort: -1
>> >> appStartTime: 1409806397305
>> >> yarnAppState: ACCEPTED
>> >>
>> >> 14/09/04 12:53:21 INFO cluster.YarnClientSchedulerBackend: Application
>> >> report from ASM:
>> >> appMasterRpcPort: 0
>> >> appStartTime: 1409806397305
>> >> yarnAppState: RUNNING
>> >>
>> >> 14/09/04 12:53:23 INFO cluster.YarnClientClusterScheduler:
>> >> YarnClientClusterScheduler.postStartHook done
>> >> 14/09/04 12:53:26 INFO cluster.YarnClientSchedulerBackend: Registered
>> >> executor:
>> >> Actor[akka.tcp://sparkExecutor@HDOP-N1.AGT:40024/user/Executor#
>> 2065794895]
>> >> with ID 1
>> >> 14/09/04 12:53:26 INFO storage.BlockManagerInfo: Registering block
>> manager
>> >> HDOP-N1.AGT:34857 with 1178.1 MB RAM
>> >> 14/09/04 12:53:26 INFO cluster.YarnClientSchedulerBackend: Registered
>> >> executor:
>> >> Actor[akka.tcp://sparkExecutor@HDOP-N4.AGT
>> :49234/user/Executor#820272849]
>> >> with ID 3
>> >> 14/09/04 12:53:27 INFO cluster.YarnClientSchedulerBackend: Registered
>> >> executor:
>> >> Actor[akka.tcp://sparkExecutor@HDOP-M.AGT
>> :38124/user/Executor#715249825]
>> >> with ID 2
>> >> 14/09/04 12:53:27 INFO storage.BlockManagerInfo: Registering block
>> manager
>> >> HDOP-N4.AGT:43365 with 1178.1 MB RAM
>> >> 14/09/04 12:53:27 INFO storage.BlockManagerInfo: Registering block
>> manager
>> >> HDOP-M.AGT:45711 with 1178.1 MB RAM
>> >> 14/09/04 12:53:55 INFO spark.SparkContext: Starting job: reduce at
>> >>
>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Got job 0 (reduce at
>> >>
>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>> >> with 1000 output partitions (allowLocal=false)
>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Final stage: Stage
>> 0(reduce
>> >> at
>> >>
>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Parents of final stage:
>> >> List()
>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Missing parents: List()
>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Submitting Stage 0
>> >> (PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing
>> parents
>> >> 14/09/04 12:53:55 INFO scheduler.DAGScheduler: Submitting 1000 missing
>> >> tasks from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
>> >> 14/09/04 12:53:55 INFO cluster.YarnClientClusterScheduler: Adding task
>> set
>> >> 0.0 with 1000 tasks
>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
>> >> TID 0 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:0
>> as
>> >> 369810 bytes in 5 ms
>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
>> >> TID 1 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:1
>> as
>> >> 506275 bytes in 2 ms
>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
>> >> TID 2 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>> >> 14/09/04 12:53:55 INFO scheduler.TaskSetManager: Serialized task 0.0:2
>> as
>> >> 501135 bytes in 2 ms
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
>> >> TID 3 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3
>> as
>> >> 506275 bytes in 5 ms
>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 1 (task
>> 0.0:1)
>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Loss was due to
>> >> org.apache.spark.api.python.PythonException
>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >> last):
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >> line 77, in main
>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 191, in dump_stream
>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 123, in dump_stream
>> >>     for obj in iterator:
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 180, in _batched
>> >>     for item in iterator:
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 612, in func
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 36, in f
>> >> SystemError: unknown opcode
>> >>
>> >> at
>> >> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>> >> at
>> >>
>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>> >> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>> >> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>> >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>> >> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>> >> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>> >> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>> >> at
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >> at
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >> at java.lang.Thread.run(Thread.java:744)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
>> >> TID 4 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1
>> as
>> >> 506275 bytes in 5 ms
>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 2 (task
>> 0.0:2)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >> last):
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >> line 77, in main
>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 191, in dump_stream
>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 123, in dump_stream
>> >>     for obj in iterator:
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 180, in _batched
>> >>     for item in iterator:
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 612, in func
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 36, in f
>> >> SystemError: unknown opcode
>> >>  [duplicate 1]
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
>> >> TID 5 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2
>> as
>> >> 501135 bytes in 5 ms
>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 3 (task
>> 0.0:3)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >> last):
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >> line 77, in main
>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 191, in dump_stream
>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 123, in dump_stream
>> >>     for obj in iterator:
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 180, in _batched
>> >>     for item in iterator:
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 612, in func
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 36, in f
>> >> SystemError: unknown opcode
>> >>  [duplicate 2]
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
>> >> TID 6 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3
>> as
>> >> 506275 bytes in 5 ms
>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 0 (task
>> 0.0:0)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >> last):
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >> line 77, in main
>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 191, in dump_stream
>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 123, in dump_stream
>> >>     for obj in iterator:
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 180, in _batched
>> >>     for item in iterator:
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 612, in func
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 36, in f
>> >> SystemError: unknown opcode
>> >>  [duplicate 3]
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
>> >> TID 7 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:0
>> as
>> >> 369810 bytes in 4 ms
>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 5 (task
>> 0.0:2)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >> last):
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >> line 77, in main
>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 191, in dump_stream
>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 123, in dump_stream
>> >>     for obj in iterator:
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 180, in _batched
>> >>     for item in iterator:
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 612, in func
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 36, in f
>> >> SystemError: unknown opcode
>> >>  [duplicate 4]
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
>> >> TID 8 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2
>> as
>> >> 501135 bytes in 3 ms
>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 4 (task
>> 0.0:1)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >> last):
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >> line 77, in main
>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 191, in dump_stream
>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 123, in dump_stream
>> >>     for obj in iterator:
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 180, in _batched
>> >>     for item in iterator:
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 612, in func
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 36, in f
>> >> SystemError: unknown opcode
>> >>  [duplicate 5]
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
>> >> TID 9 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1
>> as
>> >> 506275 bytes in 4 ms
>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 6 (task
>> 0.0:3)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >> last):
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >> line 77, in main
>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 191, in dump_stream
>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 123, in dump_stream
>> >>     for obj in iterator:
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 180, in _batched
>> >>     for item in iterator:
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 612, in func
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 36, in f
>> >> SystemError: unknown opcode
>> >>  [duplicate 6]
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
>> >> TID 10 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3
>> as
>> >> 506275 bytes in 3 ms
>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 7 (task
>> 0.0:0)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >> last):
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >> line 77, in main
>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 191, in dump_stream
>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 123, in dump_stream
>> >>     for obj in iterator:
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 180, in _batched
>> >>     for item in iterator:
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 612, in func
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 36, in f
>> >> SystemError: unknown opcode
>> >>  [duplicate 7]
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
>> >> TID 11 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:0
>> as
>> >> 369810 bytes in 3 ms
>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 8 (task
>> 0.0:2)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >> last):
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >> line 77, in main
>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 191, in dump_stream
>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 123, in dump_stream
>> >>     for obj in iterator:
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 180, in _batched
>> >>     for item in iterator:
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 612, in func
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 36, in f
>> >> SystemError: unknown opcode
>> >>  [duplicate 8]
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:2 as
>> >> TID 12 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:2
>> as
>> >> 501135 bytes in 4 ms
>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 10 (task
>> 0.0:3)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >> last):
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >> line 77, in main
>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 191, in dump_stream
>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 123, in dump_stream
>> >>     for obj in iterator:
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 180, in _batched
>> >>     for item in iterator:
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 612, in func
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 36, in f
>> >> SystemError: unknown opcode
>> >>  [duplicate 9]
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:3 as
>> >> TID 13 on executor 3: HDOP-N4.AGT (PROCESS_LOCAL)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:3
>> as
>> >> 506275 bytes in 3 ms
>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 9 (task
>> 0.0:1)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >> last):
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >> line 77, in main
>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 191, in dump_stream
>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 123, in dump_stream
>> >>     for obj in iterator:
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 180, in _batched
>> >>     for item in iterator:
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 612, in func
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 36, in f
>> >> SystemError: unknown opcode
>> >>  [duplicate 10]
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Starting task 0.0:1 as
>> >> TID 14 on executor 1: HDOP-N1.AGT (PROCESS_LOCAL)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Serialized task 0.0:1
>> as
>> >> 506275 bytes in 4 ms
>> >> 14/09/04 12:53:56 WARN scheduler.TaskSetManager: Lost TID 11 (task
>> 0.0:0)
>> >> 14/09/04 12:53:56 INFO scheduler.TaskSetManager: Loss was due to
>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >> last):
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >> line 77, in main
>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 191, in dump_stream
>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 123, in dump_stream
>> >>     for obj in iterator:
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 180, in _batched
>> >>     for item in iterator:
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 612, in func
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 36, in f
>> >> SystemError: unknown opcode
>> >>  [duplicate 11]
>> >> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Starting task 0.0:0 as
>> >> TID 15 on executor 2: HDOP-M.AGT (PROCESS_LOCAL)
>> >> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Serialized task 0.0:0
>> as
>> >> 369810 bytes in 4 ms
>> >> 14/09/04 12:53:57 WARN scheduler.TaskSetManager: Lost TID 12 (task
>> 0.0:2)
>> >> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Loss was due to
>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >> last):
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >> line 77, in main
>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 191, in dump_stream
>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 123, in dump_stream
>> >>     for obj in iterator:
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 180, in _batched
>> >>     for item in iterator:
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 612, in func
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 36, in f
>> >> SystemError: unknown opcode
>> >>  [duplicate 12]
>> >> 14/09/04 12:53:57 ERROR scheduler.TaskSetManager: Task 0.0:2 failed 4
>> >> times; aborting job
>> >> 14/09/04 12:53:57 INFO scheduler.TaskSetManager: Loss was due to
>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >> last):
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >> line 77, in main
>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 191, in dump_stream
>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 123, in dump_stream
>> >>     for obj in iterator:
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 180, in _batched
>> >>     for item in iterator:
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 612, in func
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 36, in f
>> >> SystemError: unknown opcode
>> >>  [duplicate 13]
>> >> 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Cancelling
>> >> stage 0
>> >> 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Stage 0 was
>> >> cancelled
>> >> 14/09/04 12:53:57 INFO scheduler.DAGScheduler: Failed to run reduce at
>> >>
>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>> >> Traceback (most recent call last):
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 38, in <module>
>> >>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 619, in reduce
>> >>     vals = self.mapPartitions(func).collect()
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 583, in collect
>> >>     bytesInJava = self._jrdd.collect().iterator()
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>> >> line 537, in __call__
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>> >> line 300, in get_return_value
>> >> py4j.protocol.Py4JJavaError14/09/04 12:53:57 INFO
>> >> scheduler.TaskSetManager: Loss was due to
>> >> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >> last):
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >> line 77, in main
>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 191, in dump_stream
>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 123, in dump_stream
>> >>     for obj in iterator:
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 180, in _batched
>> >>     for item in iterator:
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 612, in func
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 36, in f
>> >> SystemError: unknown opcode
>> >>  [duplicate 14]
>> >> 14/09/04 12:53:57 WARN scheduler.TaskSetManager: Loss was due to
>> >> org.apache.spark.TaskKilledException
>> >> org.apache.spark.TaskKilledException
>> >> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>> >> at
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >> at
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >> at java.lang.Thread.run(Thread.java:744)
>> >> : An error occurred while calling o24.collect.
>> >> : org.apache.spark.SparkException: Job aborted due to stage failure:
>> Task
>> >> 0.0:2 failed 4 times, most recent failure: Exception failure in TID 12
>> on
>> >> host HDOP-M.AGT: org.apache.spark.api.python.PythonException: Traceback
>> >> (most recent call last):
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >> line 77, in main
>> >>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 191, in dump_stream
>> >>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 123, in dump_stream
>> >>     for obj in iterator:
>> >>   File
>> >>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/11/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >> line 180, in _batched
>> >>     for item in iterator:
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >> line 612, in func
>> >>   File
>> >>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >> line 36, in f
>> >> SystemError: unknown opcode
>> >>
>> >>
>> >> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>> >>
>> >>
>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>> >>
>>  org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>> >>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>> >>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>> >>
>> >> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>> >>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>> >>
>> >> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >>         java.lang.Thread.run(Thread.java:744)
>> >> Driver stacktrace:
>> >> at
>> >> org.apache.spark.scheduler.DAGScheduler.org
>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>> >> at
>> >>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>> >> at
>> >>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>> >> at
>> >>
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> >> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> >> at
>> >>
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>> >> at
>> >>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>> >> at
>> >>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>> >> at scala.Option.foreach(Option.scala:236)
>> >> at
>> >>
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>> >> at
>> >>
>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>> >> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>> >> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>> >> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>> >> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>> >> at
>> >>
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>> >> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> >> at
>> >>
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> >> at
>> >>
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> >> at
>> >>
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> >>
>> >> 14/09/04 12:53:57 INFO cluster.YarnClientClusterScheduler: Removed
>> TaskSet
>> >> 0.0, whose tasks have all completed, from pool
>> >>
>> >>
>> >>
>> >>
>> >> What other procedure can be done for fixing the problem.
>> >>
>> >>
>> >> Thanks
>> >>
>> >> Oleg.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Sep 4, 2014 at 5:36 AM, Andrew Or <andrew@databricks.com>
>> wrote:
>> >>>
>> >>> Hi Oleg,
>> >>>
>> >>> Your configuration looks alright to me. I haven't seen an "unknown
>> >>> opcode" System.error before in PySpark. This usually means you have
>> >>> corrupted .pyc files lying around (ones that belonged to an old python
>> >>> version, perhaps). What python version are you using? Are all your
>> nodes
>> >>> running the same version of python? What happens if you just run
>> bin/pyspark
>> >>> with the same command line arguments, and then do an
>> >>> "sc.parallelize(range(10)).count()", does it still fail?
>> >>>
>> >>> Andrew
>> >>>
>> >>>
>> >>> 2014-09-02 23:42 GMT-07:00 Oleg Ruchovets <oruchovets@gmail.com>:
>> >>>>
>> >>>> Hi I changed master to yarn but execution failed with exception
>> again. I
>> >>>> am using PySpark.
>> >>>>
>> >>>> [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>> >>>> ./bin/spark-submit --master yarn  --num-executors 3  --driver-memory
>> 4g
>> >>>> --executor-memory 2g --executor-cores 1
>>  examples/src/main/python/pi.py
>> >>>> 1000
>> >>>> /usr/jdk64/jdk1.7.0_45/bin/java
>> >>>>
>> >>>>
>> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar:/etc/hadoop/conf
>> >>>> -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
>> >>>> 14/09/03 14:35:11 INFO spark.SecurityManager: Changing view acls to:
>> >>>> root
>> >>>> 14/09/03 14:35:11 INFO spark.SecurityManager: SecurityManager:
>> >>>> authentication disabled; ui acls disabled; users with view
>> permissions:
>> >>>> Set(root)
>> >>>> 14/09/03 14:35:11 INFO slf4j.Slf4jLogger: Slf4jLogger started
>> >>>> 14/09/03 14:35:11 INFO Remoting: Starting remoting
>> >>>> 14/09/03 14:35:12 INFO Remoting: Remoting started; listening on
>> >>>> addresses :[akka.tcp://spark@HDOP-B.AGT:51707]
>> >>>> 14/09/03 14:35:12 INFO Remoting: Remoting now listens on addresses:
>> >>>> [akka.tcp://spark@HDOP-B.AGT:51707]
>> >>>> 14/09/03 14:35:12 INFO spark.SparkEnv: Registering MapOutputTracker
>> >>>> 14/09/03 14:35:12 INFO spark.SparkEnv: Registering BlockManagerMaster
>> >>>> 14/09/03 14:35:12 INFO storage.DiskBlockManager: Created local
>> directory
>> >>>> at /tmp/spark-local-20140903143512-5aab
>> >>>> 14/09/03 14:35:12 INFO storage.MemoryStore: MemoryStore started with
>> >>>> capacity 2.3 GB.
>> >>>> 14/09/03 14:35:12 INFO network.ConnectionManager: Bound socket to
>> port
>> >>>> 53216 with id = ConnectionManagerId(HDOP-B.AGT,53216)
>> >>>> 14/09/03 14:35:12 INFO storage.BlockManagerMaster: Trying to register
>> >>>> BlockManager
>> >>>> 14/09/03 14:35:12 INFO storage.BlockManagerInfo: Registering block
>> >>>> manager HDOP-B.AGT:53216 with 2.3 GB RAM
>> >>>> 14/09/03 14:35:12 INFO storage.BlockManagerMaster: Registered
>> >>>> BlockManager
>> >>>> 14/09/03 14:35:12 INFO spark.HttpServer: Starting HTTP Server
>> >>>> 14/09/03 14:35:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
>> >>>> 14/09/03 14:35:12 INFO server.AbstractConnector: Started
>> >>>> SocketConnector@0.0.0.0:50624
>> >>>> 14/09/03 14:35:12 INFO broadcast.HttpBroadcast: Broadcast server
>> started
>> >>>> at http://10.193.1.76:50624
>> >>>> 14/09/03 14:35:12 INFO spark.HttpFileServer: HTTP File server
>> directory
>> >>>> is /tmp/spark-fd7fdcb2-f45d-430f-95fa-afbc4f329b43
>> >>>> 14/09/03 14:35:12 INFO spark.HttpServer: Starting HTTP Server
>> >>>> 14/09/03 14:35:12 INFO server.Server: jetty-8.y.z-SNAPSHOT
>> >>>> 14/09/03 14:35:12 INFO server.AbstractConnector: Started
>> >>>> SocketConnector@0.0.0.0:41773
>> >>>> 14/09/03 14:35:13 INFO server.Server: jetty-8.y.z-SNAPSHOT
>> >>>> 14/09/03 14:35:13 INFO server.AbstractConnector: Started
>> >>>> SelectChannelConnector@0.0.0.0:4040
>> >>>> 14/09/03 14:35:13 INFO ui.SparkUI: Started SparkUI at
>> >>>> http://HDOP-B.AGT:4040
>> >>>> 14/09/03 14:35:13 WARN util.NativeCodeLoader: Unable to load
>> >>>> native-hadoop library for your platform... using builtin-java
>> classes where
>> >>>> applicable
>> >>>> --args is deprecated. Use --arg instead.
>> >>>> 14/09/03 14:35:14 INFO client.RMProxy: Connecting to ResourceManager
>> at
>> >>>> HDOP-N1.AGT/10.193.1.72:8050
>> >>>> 14/09/03 14:35:14 INFO yarn.Client: Got Cluster metric info from
>> >>>> ApplicationsManager (ASM), number of NodeManagers: 6
>> >>>> 14/09/03 14:35:14 INFO yarn.Client: Queue info ... queueName:
>> default,
>> >>>> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>> >>>>       queueApplicationCount = 0, queueChildQueueCount = 0
>> >>>> 14/09/03 14:35:14 INFO yarn.Client: Max mem capabililty of a single
>> >>>> resource in this cluster 13824
>> >>>> 14/09/03 14:35:14 INFO yarn.Client: Preparing Local resources
>> >>>> 14/09/03 14:35:14 INFO yarn.Client: Uploading
>> >>>>
>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>> >>>> to
>> >>>>
>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0036/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>> >>>> 14/09/03 14:35:16 INFO yarn.Client: Uploading
>> >>>>
>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>> >>>> to
>> >>>>
>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0036/pi.py
>> >>>> 14/09/03 14:35:16 INFO yarn.Client: Setting up the launch environment
>> >>>> 14/09/03 14:35:16 INFO yarn.Client: Setting up container launch
>> context
>> >>>> 14/09/03 14:35:16 INFO yarn.Client: Command for starting the Spark
>> >>>> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
>> >>>> -Djava.io.tmpdir=$PWD/tmp,
>> >>>>
>> -Dspark.tachyonStore.folderName=\"spark-98b7d323-2faf-419a-a88d-1a0c549dc5d4\",
>> >>>> -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
>> >>>>
>> -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
>> >>>> -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
>> >>>> -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\",
>> >>>> -Dspark.fileserver.uri=\"http://10.193.1.76:41773\",
>> >>>> -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"51707\",
>> >>>> -Dspark.executor.cores=\"1\",
>> >>>> -Dspark.httpBroadcast.uri=\"http://10.193.1.76:50624\",
>> >>>> -Dlog4j.configuration=log4j-spark-container.properties,
>> >>>> org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused,
>> --jar ,
>> >>>> null,  --args  'HDOP-B.AGT:51707' , --executor-memory, 2048,
>> >>>> --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
>> >>>> <LOG_DIR>/stderr)
>> >>>> 14/09/03 14:35:16 INFO yarn.Client: Submitting application to ASM
>> >>>> 14/09/03 14:35:16 INFO impl.YarnClientImpl: Submitted application
>> >>>> application_1409559972905_0036
>> >>>> 14/09/03 14:35:16 INFO cluster.YarnClientSchedulerBackend:
>> Application
>> >>>> report from ASM:
>> >>>> appMasterRpcPort: -1
>> >>>> appStartTime: 1409726116517
>> >>>> yarnAppState: ACCEPTED
>> >>>>
>> >>>> 14/09/03 14:35:17 INFO cluster.YarnClientSchedulerBackend:
>> Application
>> >>>> report from ASM:
>> >>>> appMasterRpcPort: -1
>> >>>> appStartTime: 1409726116517
>> >>>> yarnAppState: ACCEPTED
>> >>>>
>> >>>> 14/09/03 14:35:18 INFO cluster.YarnClientSchedulerBackend:
>> Application
>> >>>> report from ASM:
>> >>>> appMasterRpcPort: -1
>> >>>> appStartTime: 1409726116517
>> >>>> yarnAppState: ACCEPTED
>> >>>>
>> >>>> 14/09/03 14:35:19 INFO cluster.YarnClientSchedulerBackend:
>> Application
>> >>>> report from ASM:
>> >>>> appMasterRpcPort: -1
>> >>>> appStartTime: 1409726116517
>> >>>> yarnAppState: ACCEPTED
>> >>>>
>> >>>> 14/09/03 14:35:20 INFO cluster.YarnClientSchedulerBackend:
>> Application
>> >>>> report from ASM:
>> >>>> appMasterRpcPort: -1
>> >>>> appStartTime: 1409726116517
>> >>>> yarnAppState: ACCEPTED
>> >>>>
>> >>>> 14/09/03 14:35:21 INFO cluster.YarnClientSchedulerBackend:
>> Application
>> >>>> report from ASM:
>> >>>> appMasterRpcPort: -1
>> >>>> appStartTime: 1409726116517
>> >>>> yarnAppState: ACCEPTED
>> >>>>
>> >>>> 14/09/03 14:35:22 INFO cluster.YarnClientSchedulerBackend:
>> Application
>> >>>> report from ASM:
>> >>>> appMasterRpcPort: 0
>> >>>> appStartTime: 1409726116517
>> >>>> yarnAppState: RUNNING
>> >>>>
>> >>>> 14/09/03 14:35:24 INFO cluster.YarnClientClusterScheduler:
>> >>>> YarnClientClusterScheduler.postStartHook done
>> >>>> 14/09/03 14:35:25 INFO cluster.YarnClientSchedulerBackend: Registered
>> >>>> executor:
>> >>>> Actor[akka.tcp://sparkExecutor@HDOP-B.AGT
>> :58976/user/Executor#-1831707618]
>> >>>> with ID 1
>> >>>> 14/09/03 14:35:26 INFO storage.BlockManagerInfo: Registering block
>> >>>> manager HDOP-B.AGT:44142 with 1178.1 MB RAM
>> >>>> 14/09/03 14:35:26 INFO cluster.YarnClientSchedulerBackend: Registered
>> >>>> executor:
>> >>>> Actor[akka.tcp://sparkExecutor@HDOP-N1.AGT
>> :45140/user/Executor#875812337]
>> >>>> with ID 2
>> >>>> 14/09/03 14:35:26 INFO storage.BlockManagerInfo: Registering block
>> >>>> manager HDOP-N1.AGT:48513 with 1178.1 MB RAM
>> >>>> 14/09/03 14:35:26 INFO cluster.YarnClientSchedulerBackend: Registered
>> >>>> executor:
>> >>>> Actor[akka.tcp://sparkExecutor@HDOP-N3.AGT
>> :45380/user/Executor#1559437246]
>> >>>> with ID 3
>> >>>> 14/09/03 14:35:27 INFO storage.BlockManagerInfo: Registering block
>> >>>> manager HDOP-N3.AGT:46616 with 1178.1 MB RAM
>> >>>> 14/09/03 14:35:56 INFO spark.SparkContext: Starting job: reduce at
>> >>>>
>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Got job 0 (reduce at
>> >>>>
>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>> >>>> with 1000 output partitions (allowLocal=false)
>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Final stage: Stage
>> >>>> 0(reduce at
>> >>>>
>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38)
>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Parents of final
>> stage:
>> >>>> List()
>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Missing parents:
>> List()
>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Submitting Stage 0
>> >>>> (PythonRDD[1] at RDD at PythonRDD.scala:37), which has no missing
>> parents
>> >>>> 14/09/03 14:35:56 INFO scheduler.DAGScheduler: Submitting 1000
>> missing
>> >>>> tasks from Stage 0 (PythonRDD[1] at RDD at PythonRDD.scala:37)
>> >>>> 14/09/03 14:35:56 INFO cluster.YarnClientClusterScheduler: Adding
>> task
>> >>>> set 0.0 with 1000 tasks
>> >>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Starting task 0.0:0
>> as
>> >>>> TID 0 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>> >>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Serialized task
>> 0.0:0
>> >>>> as 369811 bytes in 9 ms
>> >>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Starting task 0.0:1
>> as
>> >>>> TID 1 on executor 3: HDOP-N3.AGT (PROCESS_LOCAL)
>> >>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Serialized task
>> 0.0:1
>> >>>> as 506276 bytes in 5 ms
>> >>>> 14/09/03 14:35:56 INFO scheduler.TaskSetManager: Starting task 0.0:2
>> as
>> >>>> TID 2 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>> 0.0:2
>> >>>> as 501136 bytes in 5 ms
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:3
>> as
>> >>>> TID 3 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>> 0.0:3
>> >>>> as 506276 bytes in 4 ms
>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 2 (task
>> 0.0:2)
>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Loss was due to
>> >>>> org.apache.spark.api.python.PythonException
>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >>>> last):
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >>>> line 77, in main
>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 191, in dump_stream
>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 123, in dump_stream
>> >>>>     for obj in iterator:
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 180, in _batched
>> >>>>     for item in iterator:
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >>>> line 612, in func
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >>>> line 36, in f
>> >>>> SystemError: unknown opcode
>> >>>>
>> >>>> at
>> >>>>
>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>> >>>> at
>> >>>>
>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>> >>>> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>> >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>> >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>> >>>> at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>> >>>> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>> >>>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>> >>>> at
>> >>>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >>>> at
>> >>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >>>> at java.lang.Thread.run(Thread.java:744)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:2
>> as
>> >>>> TID 4 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>> 0.0:2
>> >>>> as 501136 bytes in 4 ms
>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 0 (task
>> 0.0:0)
>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Loss was due to
>> >>>> org.apache.spark.api.python.PythonException
>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >>>> last):
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >>>> line 77, in main
>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 191, in dump_stream
>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 123, in dump_stream
>> >>>>     for obj in iterator:
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 180, in _batched
>> >>>>     for item in iterator:
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >>>> line 612, in func
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >>>> line 36, in f
>> >>>> SystemError: unknown opcode
>> >>>>
>> >>>> at
>> >>>>
>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>> >>>> at
>> >>>>
>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>> >>>> at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>> >>>> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>> >>>> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>> >>>> at
>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>> >>>> at org.apache.spark.scheduler.Task.run(Task.scala:51)
>> >>>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>> >>>> at
>> >>>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >>>> at
>> >>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >>>> at java.lang.Thread.run(Thread.java:744)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:0
>> as
>> >>>> TID 5 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>> 0.0:0
>> >>>> as 369811 bytes in 3 ms
>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 3 (task
>> 0.0:3)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >>>> last):
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >>>> line 77, in main
>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 191, in dump_stream
>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 123, in dump_stream
>> >>>>     for obj in iterator:
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 180, in _batched
>> >>>>     for item in iterator:
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >>>> line 612, in func
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >>>> line 36, in f
>> >>>> SystemError: unknown opcode
>> >>>>  [duplicate 1]
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:3
>> as
>> >>>> TID 6 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>> 0.0:3
>> >>>> as 506276 bytes in 4 ms
>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 4 (task
>> 0.0:2)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >>>> last):
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >>>> line 77, in main
>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 191, in dump_stream
>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 123, in dump_stream
>> >>>>     for obj in iterator:
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 180, in _batched
>> >>>>     for item in iterator:
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >>>> line 612, in func
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >>>> line 36, in f
>> >>>> SystemError: unknown opcode
>> >>>>  [duplicate 1]
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:2
>> as
>> >>>> TID 7 on executor 3: HDOP-N3.AGT (PROCESS_LOCAL)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>> 0.0:2
>> >>>> as 501136 bytes in 4 ms
>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 1 (task
>> 0.0:1)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >>>> last):
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >>>> line 77, in main
>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 191, in dump_stream
>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 123, in dump_stream
>> >>>>     for obj in iterator:
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 180, in _batched
>> >>>>     for item in iterator:
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >>>> line 612, in func
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >>>> line 36, in f
>> >>>> SystemError: unknown opcode
>> >>>>  [duplicate 2]
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:1
>> as
>> >>>> TID 8 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>> 0.0:1
>> >>>> as 506276 bytes in 4 ms
>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 5 (task
>> 0.0:0)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >>>> last):
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >>>> line 77, in main
>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 191, in dump_stream
>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 123, in dump_stream
>> >>>>     for obj in iterator:
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 180, in _batched
>> >>>>     for item in iterator:
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >>>> line 612, in func
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >>>> line 36, in f
>> >>>> SystemError: unknown opcode
>> >>>>  [duplicate 3]
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:0
>> as
>> >>>> TID 9 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>> 0.0:0
>> >>>> as 369811 bytes in 4 ms
>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 6 (task
>> 0.0:3)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >>>> last):
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >>>> line 77, in main
>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 191, in dump_stream
>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 123, in dump_stream
>> >>>>     for obj in iterator:
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 180, in _batched
>> >>>>     for item in iterator:
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >>>> line 612, in func
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >>>> line 36, in f
>> >>>> SystemError: unknown opcode
>> >>>>  [duplicate 2]
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:3
>> as
>> >>>> TID 10 on executor 3: HDOP-N3.AGT (PROCESS_LOCAL)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>> 0.0:3
>> >>>> as 506276 bytes in 4 ms
>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 7 (task
>> 0.0:2)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >>>> last):
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >>>> line 77, in main
>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 191, in dump_stream
>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 123, in dump_stream
>> >>>>     for obj in iterator:
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 180, in _batched
>> >>>>     for item in iterator:
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >>>> line 612, in func
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >>>> line 36, in f
>> >>>> SystemError: unknown opcode
>> >>>>  [duplicate 4]
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:2
>> as
>> >>>> TID 11 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>> 0.0:2
>> >>>> as 501136 bytes in 3 ms
>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 9 (task
>> 0.0:0)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >>>> last):
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >>>> line 77, in main
>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 191, in dump_stream
>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 123, in dump_stream
>> >>>>     for obj in iterator:
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 180, in _batched
>> >>>>     for item in iterator:
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >>>> line 612, in func
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >>>> line 36, in f
>> >>>> SystemError: unknown opcode
>> >>>>  [duplicate 3]
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Starting task 0.0:0
>> as
>> >>>> TID 12 on executor 1: HDOP-B.AGT (PROCESS_LOCAL)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Serialized task
>> 0.0:0
>> >>>> as 369811 bytes in 4 ms
>> >>>> 14/09/03 14:35:57 WARN scheduler.TaskSetManager: Lost TID 8 (task
>> 0.0:1)
>> >>>> 14/09/03 14:35:57 INFO scheduler.TaskSetManager: Loss was due to
>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >>>> last):
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >>>> line 77, in main
>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 191, in dump_stream
>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 123, in dump_stream
>> >>>>     for obj in iterator:
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 180, in _batched
>> >>>>     for item in iterator:
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >>>> line 612, in func
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >>>> line 36, in f
>> >>>> SystemError: unknown opcode
>> >>>>  [duplicate 5]
>> >>>> 14/09/03 14:35:58 INFO scheduler.TaskSetManager: Starting task 0.0:1
>> as
>> >>>> TID 13 on executor 2: HDOP-N1.AGT (PROCESS_LOCAL)
>> >>>> 14/09/03 14:35:58 INFO scheduler.TaskSetManager: Serialized task
>> 0.0:1
>> >>>> as 506276 bytes in 3 ms
>> >>>> 14/09/03 14:35:58 WARN scheduler.TaskSetManager: Lost TID 11 (task
>> >>>> 0.0:2)
>> >>>> 14/09/03 14:35:58 INFO scheduler.TaskSetManager: Loss was due to
>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >>>> last):
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >>>> line 77, in main
>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 191, in dump_stream
>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 123, in dump_stream
>> >>>>     for obj in iterator:
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 180, in _batched
>> >>>>     for item in iterator:
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >>>> line 612, in func
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >>>> line 36, in f
>> >>>> SystemError: unknown opcode
>> >>>>  [duplicate 4]
>> >>>> 14/09/03 14:35:58 ERROR scheduler.TaskSetManager: Task 0.0:2 failed 4
>> >>>> times; aborting job
>> >>>> 14/09/03 14:35:58 INFO cluster.YarnClientClusterScheduler: Cancelling
>> >>>> stage 0
>> >>>> 14/09/03 14:35:58 INFO cluster.YarnClientClusterScheduler: Stage 0
>> was
>> >>>> cancelled
>> >>>> 14/09/03 14:35:58 INFO scheduler.TaskSetManager: Loss was due to
>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >>>> last):
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >>>> line 77, in main
>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 191, in dump_stream
>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 123, in dump_stream
>> >>>>     for obj in iterator:
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 180, in _batched
>> >>>>     for item in iterator:
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >>>> line 612, in func
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >>>> line 36, in f
>> >>>> SystemError: unknown opcode
>> >>>>  [duplicate 6]
>> >>>> 14/09/03 14:35:58 INFO scheduler.DAGScheduler: Failed to run reduce
>> at
>> >>>>
>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>> >>>> Traceback (most recent call last):
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >>>> line 38, in <module>
>> >>>>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >>>> line 619, in reduce
>> >>>>     vals = self.mapPartitions(func).collect()
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >>>> line 583, in collect
>> >>>>     bytesInJava = self._jrdd.collect().iterator()
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>> >>>> line 537, in __call__
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>> >>>> line 300, in get_return_value
>> >>>> py4j.protocol.Py4JJavaError14/09/03 14:35:58 INFO
>> >>>> scheduler.TaskSetManager: Loss was due to
>> >>>> org.apache.spark.api.python.PythonException: Traceback (most recent
>> call
>> >>>> last):
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >>>> line 77, in main
>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 191, in dump_stream
>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 123, in dump_stream
>> >>>>     for obj in iterator:
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/19/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 180, in _batched
>> >>>>     for item in iterator:
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >>>> line 612, in func
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >>>> line 36, in f
>> >>>> SystemError: unknown opcode
>> >>>>  [duplicate 7]
>> >>>> : An error occurred while calling o24.collect.
>> >>>> : org.apache.spark.SparkException: Job aborted due to stage failure:
>> >>>> Task 0.0:2 failed 4 times, most recent failure: Exception failure in
>> TID 11
>> >>>> on host HDOP-N1.AGT: org.apache.spark.api.python.PythonException:
>> Traceback
>> >>>> (most recent call last):
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/worker.py",
>> >>>> line 77, in main
>> >>>>     serializer.dump_stream(func(split_index, iterator), outfile)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 191, in dump_stream
>> >>>>     self.serializer.dump_stream(self._batched(iterator), stream)
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 123, in dump_stream
>> >>>>     for obj in iterator:
>> >>>>   File
>> >>>>
>> "/tmp/hadoop/yarn/local/usercache/root/filecache/25/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar/pyspark/serializers.py",
>> >>>> line 180, in _batched
>> >>>>     for item in iterator:
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/rdd.py",
>> >>>> line 612, in func
>> >>>>   File
>> >>>>
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> >>>> line 36, in f
>> >>>> SystemError: unknown opcode
>> >>>>
>> >>>>
>> >>>>
>> org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)
>> >>>>
>> >>>>
>> org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)
>> >>>>
>> >>>> org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)
>> >>>>
>>  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>> >>>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>> >>>>
>> >>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:111)
>> >>>>         org.apache.spark.scheduler.Task.run(Task.scala:51)
>> >>>>
>> >>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>> >>>>
>> >>>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >>>>
>> >>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >>>>         java.lang.Thread.run(Thread.java:744)
>> >>>> Driver stacktrace:
>> >>>> at
>> >>>> org.apache.spark.scheduler.DAGScheduler.org
>> $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>> >>>> at
>> >>>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
>> >>>> at
>> >>>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1026)
>> >>>> at
>> >>>>
>> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> >>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>> >>>> at
>> >>>>
>> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026)
>> >>>> at
>> >>>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>> >>>> at
>> >>>>
>> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:634)
>> >>>> at scala.Option.foreach(Option.scala:236)
>> >>>> at
>> >>>>
>> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:634)
>> >>>> at
>> >>>>
>> org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1229)
>> >>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>> >>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>> >>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>> >>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>> >>>> at
>> >>>>
>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>> >>>> at
>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>> >>>> at
>> >>>>
>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>> >>>> at
>> >>>>
>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>> >>>> at
>> >>>>
>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>> >>>>
>> >>>> 14/09/03 14:35:58 WARN scheduler.TaskSetManager: Loss was due to
>> >>>> org.apache.spark.TaskKilledException
>> >>>> org.apache.spark.TaskKilledException
>> >>>> at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:174)
>> >>>> at
>> >>>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> >>>> at
>> >>>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> >>>> at java.lang.Thread.run(Thread.java:744)
>> >>>> 14/09/03 14:35:58 INFO cluster.YarnClientClusterScheduler: Removed
>> >>>> TaskSet 0.0, whose tasks have all completed, from pool
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Wed, Sep 3, 2014 at 1:53 PM, Oleg Ruchovets <oruchovets@gmail.com
>> >
>> >>>> wrote:
>> >>>>>
>> >>>>> Hello Sandy , I changed to using yarn master but still got the
>> >>>>> exceptions:
>> >>>>>
>> >>>>> What is the procedure to execute pyspark on yarn? is it required
>> only
>> >>>>> to attached the command , or it is required to start spark
>> processes also?
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> [root@HDOP-B spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>> >>>>> ./bin/spark-submit --master yarn://HDOP-N1.AGT:8032 --num-executors
>> 3
>> >>>>> --driver-memory 4g --executor-memory 2g --executor-cores 1
>> >>>>> examples/src/main/python/pi.py   1000
>> >>>>> /usr/jdk64/jdk1.7.0_45/bin/java
>> >>>>>
>> >>>>>
>> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar:/etc/hadoop/conf
>> >>>>> -XX:MaxPermSize=128m -Djava.library.path= -Xms4g -Xmx4g
>> >>>>> 14/09/03 13:48:48 INFO spark.SecurityManager: Changing view acls to:
>> >>>>> root
>> >>>>> 14/09/03 13:48:48 INFO spark.SecurityManager: SecurityManager:
>> >>>>> authentication disabled; ui acls disabled; users with view
>> permissions:
>> >>>>> Set(root)
>> >>>>> 14/09/03 13:48:49 INFO slf4j.Slf4jLogger: Slf4jLogger started
>> >>>>> 14/09/03 13:48:49 INFO Remoting: Starting remoting
>> >>>>> 14/09/03 13:48:49 INFO Remoting: Remoting started; listening on
>> >>>>> addresses :[akka.tcp://spark@HDOP-B.AGT:34424]
>> >>>>> 14/09/03 13:48:49 INFO Remoting: Remoting now listens on addresses:
>> >>>>> [akka.tcp://spark@HDOP-B.AGT:34424]
>> >>>>> 14/09/03 13:48:49 INFO spark.SparkEnv: Registering MapOutputTracker
>> >>>>> 14/09/03 13:48:49 INFO spark.SparkEnv: Registering
>> BlockManagerMaster
>> >>>>> 14/09/03 13:48:49 INFO storage.DiskBlockManager: Created local
>> >>>>> directory at /tmp/spark-local-20140903134849-231c
>> >>>>> 14/09/03 13:48:49 INFO storage.MemoryStore: MemoryStore started with
>> >>>>> capacity 2.3 GB.
>> >>>>> 14/09/03 13:48:49 INFO network.ConnectionManager: Bound socket to
>> port
>> >>>>> 60647 with id = ConnectionManagerId(HDOP-B.AGT,60647)
>> >>>>> 14/09/03 13:48:49 INFO storage.BlockManagerMaster: Trying to
>> register
>> >>>>> BlockManager
>> >>>>> 14/09/03 13:48:49 INFO storage.BlockManagerInfo: Registering block
>> >>>>> manager HDOP-B.AGT:60647 with 2.3 GB RAM
>> >>>>> 14/09/03 13:48:49 INFO storage.BlockManagerMaster: Registered
>> >>>>> BlockManager
>> >>>>> 14/09/03 13:48:49 INFO spark.HttpServer: Starting HTTP Server
>> >>>>> 14/09/03 13:48:49 INFO server.Server: jetty-8.y.z-SNAPSHOT
>> >>>>> 14/09/03 13:48:49 INFO server.AbstractConnector: Started
>> >>>>> SocketConnector@0.0.0.0:56549
>> >>>>> 14/09/03 13:48:49 INFO broadcast.HttpBroadcast: Broadcast server
>> >>>>> started at http://10.193.1.76:56549
>> >>>>> 14/09/03 13:48:49 INFO spark.HttpFileServer: HTTP File server
>> directory
>> >>>>> is /tmp/spark-90af1222-9ea8-4dd8-887a-343d09d44333
>> >>>>> 14/09/03 13:48:49 INFO spark.HttpServer: Starting HTTP Server
>> >>>>> 14/09/03 13:48:49 INFO server.Server: jetty-8.y.z-SNAPSHOT
>> >>>>> 14/09/03 13:48:49 INFO server.AbstractConnector: Started
>> >>>>> SocketConnector@0.0.0.0:36512
>> >>>>> 14/09/03 13:48:50 INFO server.Server: jetty-8.y.z-SNAPSHOT
>> >>>>> 14/09/03 13:48:50 INFO server.AbstractConnector: Started
>> >>>>> SelectChannelConnector@0.0.0.0:4040
>> >>>>> 14/09/03 13:48:50 INFO ui.SparkUI: Started SparkUI at
>> >>>>> http://HDOP-B.AGT:4040
>> >>>>> 14/09/03 13:48:50 WARN util.NativeCodeLoader: Unable to load
>> >>>>> native-hadoop library for your platform... using builtin-java
>> classes where
>> >>>>> applicable
>> >>>>> --args is deprecated. Use --arg instead.
>> >>>>> 14/09/03 13:48:51 INFO client.RMProxy: Connecting to
>> ResourceManager at
>> >>>>> HDOP-N1.AGT/10.193.1.72:8050
>> >>>>> 14/09/03 13:48:51 INFO yarn.Client: Got Cluster metric info from
>> >>>>> ApplicationsManager (ASM), number of NodeManagers: 6
>> >>>>> 14/09/03 13:48:51 INFO yarn.Client: Queue info ... queueName:
>> default,
>> >>>>> queueCurrentCapacity: 0.0, queueMaxCapacity: 1.0,
>> >>>>>       queueApplicationCount = 0, queueChildQueueCount = 0
>> >>>>> 14/09/03 13:48:51 INFO yarn.Client: Max mem capabililty of a single
>> >>>>> resource in this cluster 13824
>> >>>>> 14/09/03 13:48:51 INFO yarn.Client: Preparing Local resources
>> >>>>> 14/09/03 13:48:51 INFO yarn.Client: Uploading
>> >>>>>
>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>> >>>>> to
>> >>>>>
>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0033/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>> >>>>> 14/09/03 13:48:53 INFO yarn.Client: Uploading
>> >>>>>
>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>> >>>>> to
>> >>>>>
>> hdfs://HDOP-B.AGT:8020/user/root/.sparkStaging/application_1409559972905_0033/pi.py
>> >>>>> 14/09/03 13:48:53 INFO yarn.Client: Setting up the launch
>> environment
>> >>>>> 14/09/03 13:48:53 INFO yarn.Client: Setting up container launch
>> context
>> >>>>> 14/09/03 13:48:53 INFO yarn.Client: Command for starting the Spark
>> >>>>> ApplicationMaster: List($JAVA_HOME/bin/java, -server, -Xmx4096m,
>> >>>>> -Djava.io.tmpdir=$PWD/tmp,
>> >>>>>
>> -Dspark.tachyonStore.folderName=\"spark-bdabb882-a2e0-46b6-8e87-90cc6e359d84\",
>> >>>>> -Dspark.executor.memory=\"2g\", -Dspark.executor.instances=\"3\",
>> >>>>>
>> -Dspark.yarn.dist.files=\"file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py\",
>> >>>>> -Dspark.yarn.secondary.jars=\"\", -Dspark.submit.pyFiles=\"\",
>> >>>>> -Dspark.driver.host=\"HDOP-B.AGT\", -Dspark.app.name=\"PythonPi\",
>> >>>>> -Dspark.fileserver.uri=\"http://10.193.1.76:36512\",
>> >>>>> -Dspark.master=\"yarn-client\", -Dspark.driver.port=\"34424\",
>> >>>>> -Dspark.executor.cores=\"1\",
>> >>>>> -Dspark.httpBroadcast.uri=\"http://10.193.1.76:56549\",
>> >>>>> -Dlog4j.configuration=log4j-spark-container.properties,
>> >>>>> org.apache.spark.deploy.yarn.ExecutorLauncher, --class, notused,
>> --jar ,
>> >>>>> null,  --args  'HDOP-B.AGT:34424' , --executor-memory, 2048,
>> >>>>> --executor-cores, 1, --num-executors , 3, 1>, <LOG_DIR>/stdout, 2>,
>> >>>>> <LOG_DIR>/stderr)
>> >>>>> 14/09/03 13:48:53 INFO yarn.Client: Submitting application to ASM
>> >>>>> 14/09/03 13:48:53 INFO impl.YarnClientImpl: Submitted application
>> >>>>> application_1409559972905_0033
>> >>>>> 14/09/03 13:48:53 INFO cluster.YarnClientSchedulerBackend:
>> Application
>> >>>>> report from ASM:
>> >>>>> appMasterRpcPort: -1
>> >>>>> appStartTime: 1409723333584
>> >>>>> yarnAppState: ACCEPTED
>> >>>>>
>> >>>>> 14/09/03 13:48:54 INFO cluster.YarnClientSchedulerBackend:
>> Application
>> >>>>> report from ASM:
>> >>>>> appMasterRpcPort: -1
>> >>>>> appStartTime: 1409723333584
>> >>>>> yarnAppState: ACCEPTED
>> >>>>>
>> >>>>> 14/09/03 13:48:55 INFO cluster.YarnClientSchedulerBackend:
>> Application
>> >>>>> report from ASM:
>> >>>>> appMasterRpcPort: -1
>> >>>>> appStartTime: 1409723333584
>> >>>>> yarnAppState: ACCEPTED
>> >>>>>
>> >>>>> 14/09/03 13:48:56 INFO cluster.YarnClientSchedulerBackend:
>> Application
>> >>>>> report from ASM:
>> >>>>> appMasterRpcPort: -1
>> >>>>> appStartTime: 1409723333584
>> >>>>> yarnAppState: ACCEPTED
>> >>>>>
>> >>>>> 14/09/03 13:48:57 INFO cluster.YarnClientSchedulerBackend:
>> Application
>> >>>>> report from ASM:
>> >>>>> appMasterRpcPort: -1
>> >>>>> appStartTime: 1409723333584
>> >>>>> yarnAppState: ACCEPTED
>> >>>>>
>> >>>>> 14/09/03 13:48:58 INFO cluster.YarnClientSchedulerBackend:
>> Application
>> >>>>> report from ASM:
>> >>>>> appMasterRpcPort: 0
>> >>>>> appStartTime: 1409723333584
>> >>>>> yarnAppState: RUNNING
>> >>>>>
>> >>>>> 14/09/03 13:49:00 INFO cluster.YarnClientClusterScheduler:
>> >>>>> YarnClientClusterScheduler.postStartHook done
>> >>>>> 14/09/03 13:49:01 INFO cluster.YarnClientSchedulerBackend:
>> Registered
>> >>>>> executor:
>> >>>>> Actor[akka.tcp://sparkExecutor@HDOP-B.AGT
>> :57078/user/Executor#1595833626]
>> >>>>> with ID 1
>> >>>>> 14/09/03 13:49:02 INFO storage.BlockManagerInfo: Registering block
>> >>>>> manager HDOP-B.AGT:54579 with 1178.1 MB RAM
>> >>>>> 14/09/03 13:49:03 INFO cluster.YarnClientSchedulerBackend:
>> Registered
>> >>>>> executor:
>> >>>>> Actor[akka.tcp://sparkExecutor@HDOP-N4.AGT
>> :43121/user/Executor#-1266627304]
>> >>>>> with ID 2
>> >>>>> 14/09/03 13:49:03 INFO cluster.YarnClientSchedulerBackend:
>> Registered
>> >>>>> executor:
>> >>>>> Actor[akka.tcp://sparkExecutor@HDOP-N2.AGT
>> :36952/user/Executor#1003961369]
>> >>>>> with ID 3
>> >>>>> 14/09/03 13:49:04 INFO storage.BlockManagerInfo: Registering block
>> >>>>> manager HDOP-N4.AGT:56891 with 1178.1 MB RAM
>> >>>>> 14/09/03 13:49:04 INFO storage.BlockManagerInfo: Registering block
>> >>>>> manager HDOP-N2.AGT:42381 with 1178.1 MB RAM
>> >>>>> 14/09/03 13:49:33 INFO spark.SparkContext: Starting job: reduce at
>> >>>>>
>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py:38
>> >>>>> 14/09/03 13:49:33 INFO scheduler.DAGScheduler: Got job 0 (reduce at
>> >>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2....
>>
>> [Message tronqué]
>
>
>

Mime
View raw message