spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject Re: pyspark yarn got exception
Date Wed, 03 Sep 2014 05:40:05 GMT
Hi Oleg. To run on YARN, simply set master to "yarn".  The YARN
configuration, located in a yarn-site.xml, determines where to look for the
YARN ResourceManager.

PROCESS_LOCAL is orthogonal to the choice of cluster resource manager. A
task is considered PROCESS_LOCAL when the executor it's running in happens
to have the data it's processing cached.

If you're looking to get familiar with the kind of confusing web of
terminology, this blog post might be helpful:
http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/

-Sandy


On Tue, Sep 2, 2014 at 9:51 PM, Oleg Ruchovets <oruchovets@gmail.com> wrote:

> Hi ,
>   I change my command to :
>   ./bin/spark-submit --master spark://HDOP-B.AGT:7077 --num-executors 3
>  --driver-memory 4g --executor-memory 2g --executor-cores 1
> examples/src/main/python/pi.py   1000
> and it fixed the problem.
>
> I still have couple of questions:
>    PROCESS_LOCAL is not Yarn execution , right? how should I configure the
> running on yarn? Should I exeture start-all script on all machine or only
> one?  Where is the UI / LOGS of spark execution?
>
>
>
>
>
>  152 152 SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:14 0.2 s  0 0
> SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s 39 ms 2 2
> SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s 39 ms 3 3
> SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s 39 ms1 ms 4 4
> SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.8 s 39 ms 2 ms 5 5
> SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.8 s 39 ms1 ms 6 6
> SUCCESS PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.8 s 1 ms 7 7
> SUCCESSPROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:09 0.9 s  8 8 SUCCESS
> PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.3 s  9 9 SUCCESS
> PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.4 s  10 10 SUCCESS
> PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.3 s 1 ms 11 11 SUCCESS
> PROCESS_LOCAL HDOP-B.AGT 2014/09/03 12:35:10 0.3 s
>
>
> On Wed, Sep 3, 2014 at 12:19 PM, Oleg Ruchovets <oruchovets@gmail.com>
> wrote:
>
>> Hi Andrew.
>>    what should I do to set master on yarn, can you please pointing me on
>> command or documentation how to do it?
>>
>>
>> I am doing the following:
>>    executed start-all.sh
>>    [root@HDOP-B sbin]# ./start-all.sh
>> starting org.apache.spark.deploy.master.Master, logging to
>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-HDOP-B.AGT.out
>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of
>> known hosts.
>> localhost: starting org.apache.spark.deploy.worker.Worker, logging to
>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-HDOP-B.AGT.out
>>
>>
>> after execute the command:
>>     ./bin/spark-submit --master spark://HDOP-B.AGT:7077
>> examples/src/main/python/pi.py 1000
>>
>>
>> the result is the following:
>>
>>    /usr/jdk64/jdk1.7.0_45/bin/java
>>
>> ::/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/conf:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/lib/spark-assembly-1.0.1.2.1.3.0-563-hadoop2.4.0.2.1.3.0-563.jar
>> -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
>> 14/09/03 12:10:06 INFO SecurityManager: Using Spark's default log4j
>> profile: org/apache/spark/log4j-defaults.properties
>> 14/09/03 12:10:06 INFO SecurityManager: Changing view acls to: root
>> 14/09/03 12:10:06 INFO SecurityManager: SecurityManager: authentication
>> disabled; ui acls disabled; users with view permissions: Set(root)
>> 14/09/03 12:10:07 INFO Slf4jLogger: Slf4jLogger started
>> 14/09/03 12:10:07 INFO Remoting: Starting remoting
>> 14/09/03 12:10:07 INFO Remoting: Remoting started; listening on addresses
>> :[akka.tcp://spark@HDOP-B.AGT:38944]
>> 14/09/03 12:10:07 INFO Remoting: Remoting now listens on addresses:
>> [akka.tcp://spark@HDOP-B.AGT:38944]
>> 14/09/03 12:10:07 INFO SparkEnv: Registering MapOutputTracker
>> 14/09/03 12:10:07 INFO SparkEnv: Registering BlockManagerMaster
>> 14/09/03 12:10:08 INFO DiskBlockManager: Created local directory at
>> /tmp/spark-local-20140903121008-cf09
>> 14/09/03 12:10:08 INFO MemoryStore: MemoryStore started with capacity
>> 294.9 MB.
>> 14/09/03 12:10:08 INFO ConnectionManager: Bound socket to port 45041 with
>> id = ConnectionManagerId(HDOP-B.AGT,45041)
>> 14/09/03 12:10:08 INFO BlockManagerMaster: Trying to register BlockManager
>> 14/09/03 12:10:08 INFO BlockManagerInfo: Registering block manager
>> HDOP-B.AGT:45041 with 294.9 MB RAM
>> 14/09/03 12:10:08 INFO BlockManagerMaster: Registered BlockManager
>> 14/09/03 12:10:08 INFO HttpServer: Starting HTTP Server
>> 14/09/03 12:10:08 INFO HttpBroadcast: Broadcast server started at
>> http://10.193.1.76:59336
>> 14/09/03 12:10:08 INFO HttpFileServer: HTTP File server directory is
>> /tmp/spark-7bf5c3c3-1c02-41e8-9fb0-983e175dd45c
>> 14/09/03 12:10:08 INFO HttpServer: Starting HTTP Server
>> 14/09/03 12:10:08 INFO SparkUI: Started SparkUI at http://HDOP-B.AGT:4040
>> 14/09/03 12:10:09 WARN NativeCodeLoader: Unable to load native-hadoop
>> library for your platform... using builtin-java classes where applicable
>> 14/09/03 12:10:09 INFO Utils: Copying
>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>> to /tmp/spark-4e252376-70cb-4171-bf2c-d804524e816c/pi.py
>> 14/09/03 12:10:09 INFO SparkContext: Added file
>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>> at http://10.193.1.76:45893/files/pi.py with timestamp 1409717409277
>> 14/09/03 12:10:09 INFO AppClient$ClientActor: Connecting to master
>> spark://HDOP-B.AGT:7077...
>> 14/09/03 12:10:09 INFO SparkDeploySchedulerBackend: Connected to Spark
>> cluster with app ID app-20140903121009-0000
>> 14/09/03 12:10:09 INFO AppClient$ClientActor: Executor added:
>> app-20140903121009-0000/0 on worker-20140903120712-HDOP-B.AGT-51161
>> (HDOP-B.AGT:51161) with 8 cores
>> 14/09/03 12:10:09 INFO SparkDeploySchedulerBackend: Granted executor ID
>> app-20140903121009-0000/0 on hostPort HDOP-B.AGT:51161 with 8 cores, 512.0
>> MB RAM
>> 14/09/03 12:10:09 INFO AppClient$ClientActor: Executor updated:
>> app-20140903121009-0000/0 is now RUNNING
>> 14/09/03 12:10:12 INFO SparkDeploySchedulerBackend: Registered executor:
>> Actor[akka.tcp://sparkExecutor@HDOP-B.AGT:38143/user/Executor#1295757828]
>> with ID 0
>> 14/09/03 12:10:12 INFO BlockManagerInfo: Registering block manager
>> HDOP-B.AGT:38670 with 294.9 MB RAM
>> Traceback (most recent call last):
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>> line 38, in <module>
>>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>> line 271, in parallelize
>>     jrdd = readRDDFromFile(self._jsc, tempFile.name, numSlices)
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>> line 537, in __call__
>>   File
>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>> line 300, in get_return_value
>> py4j.protocol.Py4JJavaError: An error occurred while calling
>> z:org.apache.spark.api.python.PythonRDD.readRDDFromFile.
>> : java.lang.OutOfMemoryError: Java heap space
>> at
>> org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279)
>>  at
>> org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>  at java.lang.reflect.Method.invoke(Method.java:606)
>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>>  at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>> at py4j.Gateway.invoke(Gateway.java:259)
>>  at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>> at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>  at py4j.GatewayConnection.run(GatewayConnection.java:207)
>> at java.lang.Thread.run(Thread.java:744)
>>
>>
>>
>> What should I do to fix the issue
>>
>> Thanks
>> Oleg.
>>
>>
>> On Tue, Sep 2, 2014 at 10:32 PM, Andrew Or <andrew@databricks.com> wrote:
>>
>>> Hi Oleg,
>>>
>>> If you are running Spark on a yarn cluster, you should set --master to
>>> yarn. By default this runs in client mode, which redirects all output of
>>> your application to your console. This is failing because it is trying to
>>> connect to a standalone master that you probably did not start. I am
>>> somewhat puzzled as to how you ran into an OOM from this configuration,
>>> however. Does this problem still occur if you set the correct master?
>>>
>>> -Andrew
>>>
>>>
>>> 2014-09-02 2:42 GMT-07:00 Oleg Ruchovets <oruchovets@gmail.com>:
>>>
>>> Hi ,
>>>>    I've installed pyspark on hpd hortonworks cluster.
>>>>   Executing pi example:
>>>>
>>>> command:
>>>>        spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]#
>>>> ./bin/spark-submit --master spark://10.193.1.71:7077
>>>> examples/src/main/python/pi.py   1000
>>>>
>>>> exception:
>>>>
>>>>     14/09/02 17:34:02 INFO SecurityManager: Using Spark's default log4j
>>>> profile: org/apache/spark/log4j-defaults.properties
>>>> 14/09/02 17:34:02 INFO SecurityManager: Changing view acls to: root
>>>> 14/09/02 17:34:02 INFO SecurityManager: SecurityManager: authentication
>>>> disabled; ui acls disabled; users with view permissions: Set(root)
>>>> 14/09/02 17:34:02 INFO Slf4jLogger: Slf4jLogger started
>>>> 14/09/02 17:34:02 INFO Remoting: Starting remoting
>>>> 14/09/02 17:34:03 INFO Remoting: Remoting started; listening on
>>>> addresses :[akka.tcp://spark@HDOP-M.AGT:41059]
>>>> 14/09/02 17:34:03 INFO Remoting: Remoting now listens on addresses:
>>>> [akka.tcp://spark@HDOP-M.AGT:41059]
>>>> 14/09/02 17:34:03 INFO SparkEnv: Registering MapOutputTracker
>>>> 14/09/02 17:34:03 INFO SparkEnv: Registering BlockManagerMaster
>>>> 14/09/02 17:34:03 INFO DiskBlockManager: Created local directory at
>>>> /tmp/spark-local-20140902173403-cda8
>>>> 14/09/02 17:34:03 INFO MemoryStore: MemoryStore started with capacity
>>>> 294.9 MB.
>>>> 14/09/02 17:34:03 INFO ConnectionManager: Bound socket to port 34931
>>>> with id = ConnectionManagerId(HDOP-M.AGT,34931)
>>>> 14/09/02 17:34:03 INFO BlockManagerMaster: Trying to register
>>>> BlockManager
>>>> 14/09/02 17:34:03 INFO BlockManagerInfo: Registering block manager
>>>> HDOP-M.AGT:34931 with 294.9 MB RAM
>>>> 14/09/02 17:34:03 INFO BlockManagerMaster: Registered BlockManager
>>>> 14/09/02 17:34:03 INFO HttpServer: Starting HTTP Server
>>>> 14/09/02 17:34:03 INFO HttpBroadcast: Broadcast server started at
>>>> http://10.193.1.71:54341
>>>> 14/09/02 17:34:03 INFO HttpFileServer: HTTP File server directory is
>>>> /tmp/spark-77c7a7dc-181e-4069-a014-8103a6a6330a
>>>> 14/09/02 17:34:03 INFO HttpServer: Starting HTTP Server
>>>> 14/09/02 17:34:04 INFO SparkUI: Started SparkUI at
>>>> http://HDOP-M.AGT:4040
>>>> 14/09/02 17:34:04 WARN NativeCodeLoader: Unable to load native-hadoop
>>>> library for your platform... using builtin-java classes where applicable
>>>> 14/09/02 17:34:04 INFO Utils: Copying
>>>> /root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>>> to /tmp/spark-f2e0cc0f-59cb-4f6c-9d48-f16205a40c7e/pi.py
>>>> 14/09/02 17:34:04 INFO SparkContext: Added file
>>>> file:/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py
>>>> at http://10.193.1.71:52938/files/pi.py with timestamp 1409650444941
>>>> 14/09/02 17:34:05 INFO AppClient$ClientActor: Connecting to master
>>>> spark://10.193.1.71:7077...
>>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to
>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>> akka.remote.EndpointAssociationException: Association failed with
>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to
>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>> akka.remote.EndpointAssociationException: Association failed with
>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to
>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>> akka.remote.EndpointAssociationException: Association failed with
>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>> 14/09/02 17:34:05 WARN AppClient$ClientActor: Could not connect to
>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>> akka.remote.EndpointAssociationException: Association failed with
>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>> 14/09/02 17:34:25 INFO AppClient$ClientActor: Connecting to master
>>>> spark://10.193.1.71:7077...
>>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to
>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>> akka.remote.EndpointAssociationException: Association failed with
>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to
>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>> akka.remote.EndpointAssociationException: Association failed with
>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to
>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>> akka.remote.EndpointAssociationException: Association failed with
>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>> 14/09/02 17:34:25 WARN AppClient$ClientActor: Could not connect to
>>>> akka.tcp://sparkMaster@10.193.1.71:7077:
>>>> akka.remote.EndpointAssociationException: Association failed with
>>>> [akka.tcp://sparkMaster@10.193.1.71:7077]
>>>> Traceback (most recent call last):
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/examples/src/main/python/pi.py",
>>>> line 38, in <module>
>>>>     count = sc.parallelize(xrange(1, n+1), slices).map(f).reduce(add)
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/pyspark/context.py",
>>>> line 271, in parallelize
>>>>     jrdd = readRDDFromFile(self._jsc, tempFile.name, numSlices)
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/java_gateway.py",
>>>> line 537, in __call__
>>>>   File
>>>> "/root/spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563/python/lib/py4j-0.8.1-src.zip/py4j/protocol.py",
>>>> line 300, in get_return_value
>>>> py4j.protocol.Py4JJavaError: An error occurred while calling
>>>> z:org.apache.spark.api.python.PythonRDD.readRDDFromFile.
>>>> : java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>> at
>>>> org.apache.spark.api.python.PythonRDD$.readRDDFromFile(PythonRDD.scala:279)
>>>> at
>>>> org.apache.spark.api.python.PythonRDD.readRDDFromFile(PythonRDD.scala)
>>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>>  at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>>>  at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>>>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>>>>  at py4j.Gateway.invoke(Gateway.java:259)
>>>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>>>>  at py4j.commands.CallCommand.execute(CallCommand.java:79)
>>>> at py4j.GatewayConnection.run(GatewayConnection.java:207)
>>>>  at java.lang.Thread.run(Thread.java:744)
>>>>
>>>>
>>>>
>>>> Question:
>>>>     how can I know spark master and port? Where is it defined?
>>>>
>>>> Thanks
>>>> Oleg.
>>>>
>>>
>>>
>>
>

Mime
View raw message