spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject Re: No executors allocated on yarn with latest master branch
Date Fri, 20 Feb 2015 23:05:45 GMT
Are you using the capacity scheduler or fifo scheduler without multi
resource scheduling by any chance?

On Thu, Feb 12, 2015 at 1:51 PM, Anders Arpteg <arpteg@spotify.com> wrote:

> The nm logs only seems to contain similar to the following. Nothing else
> in the same time range. Any help?
>
> 2015-02-12 20:47:31,245 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Event EventType: KILL_CONTAINER sent to absent container
> container_1422406067005_0053_01_000002
> 2015-02-12 20:47:31,246 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Event EventType: KILL_CONTAINER sent to absent container
> container_1422406067005_0053_01_000012
> 2015-02-12 20:47:31,246 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Event EventType: KILL_CONTAINER sent to absent container
> container_1422406067005_0053_01_000022
> 2015-02-12 20:47:31,246 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Event EventType: KILL_CONTAINER sent to absent container
> container_1422406067005_0053_01_000032
> 2015-02-12 20:47:31,246 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Event EventType: KILL_CONTAINER sent to absent container
> container_1422406067005_0053_01_000042
> 2015-02-12 21:24:30,515 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Event EventType: FINISH_APPLICATION sent to absent application
> application_1422406067005_0053
>
> On Thu, Feb 12, 2015 at 10:38 PM, Sandy Ryza <sandy.ryza@cloudera.com>
> wrote:
>
>> It seems unlikely to me that it would be a 2.2 issue, though not entirely
>> impossible.  Are you able to find any of the container logs?  Is the
>> NodeManager launching containers and reporting some exit code?
>>
>> -Sandy
>>
>> On Thu, Feb 12, 2015 at 1:21 PM, Anders Arpteg <arpteg@spotify.com>
>> wrote:
>>
>>> No, not submitting from windows, from a debian distribution. Had a quick
>>> look at the rm logs, and it seems some containers are allocated but then
>>> released again for some reason. Not easy to make sense of the logs, but
>>> here is a snippet from the logs (from a test in our small test cluster) if
>>> you'd like to have a closer look: http://pastebin.com/8WU9ivqC
>>>
>>> Sandy, sounds like it could possible be a 2.2 issue then, or what do you
>>> think?
>>>
>>> Thanks,
>>> Anders
>>>
>>> On Thu, Feb 12, 2015 at 3:11 PM, Aniket Bhatnagar <
>>> aniket.bhatnagar@gmail.com> wrote:
>>>
>>>> This is tricky to debug. Check logs of node and resource manager of
>>>> YARN to see if you can trace the error. In the past I have to closely look
>>>> at arguments getting passed to YARN container (they get logged before
>>>> attempting to launch containers). If I still don't get a clue, I had to
>>>> check the script generated by YARN to execute the container and even run
>>>> manually to trace at what line the error has occurred.
>>>>
>>>> BTW are you submitting the job from windows?
>>>>
>>>> On Thu, Feb 12, 2015, 3:34 PM Anders Arpteg <arpteg@spotify.com> wrote:
>>>>
>>>>> Interesting to hear that it works for you. Are you using Yarn 2.2 as
>>>>> well? No strange log message during startup, and can't see any other
log
>>>>> messages since no executer gets launched. Does not seems to work in
>>>>> yarn-client mode either, failing with the exception below.
>>>>>
>>>>> Exception in thread "main" org.apache.spark.SparkException: Yarn
>>>>> application has already ended! It might have been killed or unable to
>>>>> launch application master.
>>>>>         at
>>>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.waitForApplication(YarnClientSchedulerBackend.scala:119)
>>>>>         at
>>>>> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:59)
>>>>>         at
>>>>> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:141)
>>>>>         at org.apache.spark.SparkContext.<init>(SparkContext.scala:370)
>>>>>         at
>>>>> com.spotify.analytics.AnalyticsSparkContext.<init>(AnalyticsSparkContext.scala:8)
>>>>>         at
>>>>> com.spotify.analytics.DataSampler$.main(DataSampler.scala:42)
>>>>>         at com.spotify.analytics.DataSampler.main(DataSampler.scala)
>>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>         at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>         at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>         at
>>>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:551)
>>>>>         at
>>>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:155)
>>>>>         at
>>>>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:178)
>>>>>         at
>>>>> org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:99)
>>>>>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>>>
>>>>> /Anders
>>>>>
>>>>>
>>>>> On Thu, Feb 12, 2015 at 1:33 AM, Sandy Ryza <sandy.ryza@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Anders,
>>>>>>
>>>>>> I just tried this out and was able to successfully acquire
>>>>>> executors.  Any strange log messages or additional color you can
provide on
>>>>>> your setup?  Does yarn-client mode work?
>>>>>>
>>>>>> -Sandy
>>>>>>
>>>>>> On Wed, Feb 11, 2015 at 1:28 PM, Anders Arpteg <arpteg@spotify.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Compiled the latest master of Spark yesterday (2015-02-10) for
>>>>>>> Hadoop 2.2 and failed executing jobs in yarn-cluster mode for
that
>>>>>>> build. Works successfully with spark 1.2 (and also master from
2015-01-16),
>>>>>>> so something has changed since then that prevents the job from
receiving
>>>>>>> any executors on the cluster.
>>>>>>>
>>>>>>> Basic symptoms are that the jobs fires up the AM, but after
>>>>>>> examining the "executors" page in the web ui, only the driver
is
>>>>>>> listed, no executors are ever received, and the driver keep waiting
>>>>>>> forever. Has anyone seemed similar problems?
>>>>>>>
>>>>>>> Thanks for any insights,
>>>>>>> Anders
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Mime
View raw message