spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anders Arpteg <arp...@spotify.com>
Subject Re: No executors allocated on yarn with latest master branch
Date Thu, 12 Feb 2015 21:51:32 GMT
The nm logs only seems to contain similar to the following. Nothing else in
the same time range. Any help?

2015-02-12 20:47:31,245 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Event EventType: KILL_CONTAINER sent to absent container
container_1422406067005_0053_01_000002
2015-02-12 20:47:31,246 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Event EventType: KILL_CONTAINER sent to absent container
container_1422406067005_0053_01_000012
2015-02-12 20:47:31,246 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Event EventType: KILL_CONTAINER sent to absent container
container_1422406067005_0053_01_000022
2015-02-12 20:47:31,246 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Event EventType: KILL_CONTAINER sent to absent container
container_1422406067005_0053_01_000032
2015-02-12 20:47:31,246 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Event EventType: KILL_CONTAINER sent to absent container
container_1422406067005_0053_01_000042
2015-02-12 21:24:30,515 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Event EventType: FINISH_APPLICATION sent to absent application
application_1422406067005_0053

On Thu, Feb 12, 2015 at 10:38 PM, Sandy Ryza <sandy.ryza@cloudera.com>
wrote:

> It seems unlikely to me that it would be a 2.2 issue, though not entirely
> impossible.  Are you able to find any of the container logs?  Is the
> NodeManager launching containers and reporting some exit code?
>
> -Sandy
>
> On Thu, Feb 12, 2015 at 1:21 PM, Anders Arpteg <arpteg@spotify.com> wrote:
>
>> No, not submitting from windows, from a debian distribution. Had a quick
>> look at the rm logs, and it seems some containers are allocated but then
>> released again for some reason. Not easy to make sense of the logs, but
>> here is a snippet from the logs (from a test in our small test cluster) if
>> you'd like to have a closer look: http://pastebin.com/8WU9ivqC
>>
>> Sandy, sounds like it could possible be a 2.2 issue then, or what do you
>> think?
>>
>> Thanks,
>> Anders
>>
>> On Thu, Feb 12, 2015 at 3:11 PM, Aniket Bhatnagar <
>> aniket.bhatnagar@gmail.com> wrote:
>>
>>> This is tricky to debug. Check logs of node and resource manager of YARN
>>> to see if you can trace the error. In the past I have to closely look at
>>> arguments getting passed to YARN container (they get logged before
>>> attempting to launch containers). If I still don't get a clue, I had to
>>> check the script generated by YARN to execute the container and even run
>>> manually to trace at what line the error has occurred.
>>>
>>> BTW are you submitting the job from windows?
>>>
>>> On Thu, Feb 12, 2015, 3:34 PM Anders Arpteg <arpteg@spotify.com> wrote:
>>>
>>>> Interesting to hear that it works for you. Are you using Yarn 2.2 as
>>>> well? No strange log message during startup, and can't see any other log
>>>> messages since no executer gets launched. Does not seems to work in
>>>> yarn-client mode either, failing with the exception below.
>>>>
>>>> Exception in thread "main" org.apache.spark.SparkException: Yarn
>>>> application has already ended! It might have been killed or unable to
>>>> launch application master.
>>>>         at
>>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:119)
>>>>         at
>>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:59)
>>>>         at
>>>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
>>>>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:370)
>>>>         at
>>>> com.spotify.analytics.AnalyticsSparkContext.<init>(AnalyticsSparkContext.scala:8)
>>>>         at com.spotify.analytics.DataSampler$.main(DataSampler.scala:42)
>>>>         at com.spotify.analytics.DataSampler.main(DataSampler.scala)
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>         at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>         at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>         at
>>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:551)
>>>>         at
>>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:155)
>>>>         at
>>>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:178)
>>>>         at
>>>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:99)
>>>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>
>>>> /Anders
>>>>
>>>>
>>>> On Thu, Feb 12, 2015 at 1:33 AM, Sandy Ryza <sandy.ryza@cloudera.com>
>>>> wrote:
>>>>
>>>>> Hi Anders,
>>>>>
>>>>> I just tried this out and was able to successfully acquire executors.
>>>>> Any strange log messages or additional color you can provide on your
>>>>> setup?  Does yarn-client mode work?
>>>>>
>>>>> -Sandy
>>>>>
>>>>> On Wed, Feb 11, 2015 at 1:28 PM, Anders Arpteg <arpteg@spotify.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Compiled the latest master of Spark yesterday (2015-02-10) for Hadoop
>>>>>> 2.2 and failed executing jobs in yarn-cluster mode for that build.
Works
>>>>>> successfully with spark 1.2 (and also master from 2015-01-16), so
something
>>>>>> has changed since then that prevents the job from receiving any executors
>>>>>> on the cluster.
>>>>>>
>>>>>> Basic symptoms are that the jobs fires up the AM, but after examining
>>>>>> the "executors" page in the web ui, only the driver is listed, no
>>>>>> executors are ever received, and the driver keep waiting forever.
Has
>>>>>> anyone seemed similar problems?
>>>>>>
>>>>>> Thanks for any insights,
>>>>>> Anders
>>>>>>
>>>>>
>>>>>
>>>>
>>
>

Mime
View raw message