spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <rb...@netflix.com.INVALID>
Subject Re: [DISCUSS] Change default executor log URLs for YARN
Date Fri, 08 Feb 2019 23:40:39 GMT
I suggest using the current behavior as the default and add a flag to
implement the behavior you're suggesting: to link to the logs path in YARN
instead of directly to stderr and stdout.

On Fri, Feb 8, 2019 at 3:33 PM Jungtaek Lim <kabhwan@gmail.com> wrote:

> Ryan,
>
> actually I'm not clear about your suggestion. For me three possible
> options here:
>
> 1. If we want to let users be able to completely rewrite log urls, that's
> SPARK-26792 <https://issues.apache.org/jira/browse/SPARK-26792>. For SHS
> we already addressed it.
> 2. We could let users turning on/off flag option to just get one url or
> default two stdout/stderr urls.
> 3. We could let users enumerate file names they want to link, and create
> log links for each file.
>
> Which one do you suggest?
>
> 2019년 2월 9일 (토) 오전 8:24, Ryan Blue <rblue@netflix.com>님이 작성:
>
>> Jungtaek,
>>
>> Thanks for the extra context. Those quotes are the confirmation that I
>> was looking for to expose the link you suggest instead of going directly to
>> stderr and stdout.
>>
>> What do you think about my suggestion to change this with a config
>> option? I would prefer that since we use the supported pattern. But I would
>> support moving forward on this either way.
>>
>> rb
>>
>> On Fri, Feb 8, 2019 at 3:03 PM Sean Owen <srowen@gmail.com> wrote:
>>
>>> I think that's a reasonable argument, that it provides links to
>>> potentially several logs of interest. It reduces the UI clutter a
>>> little at the cost of one more hop to get to logs.
>>> I don't feel strongly about it but think that's a reasonable thing to do.
>>>
>>> On Fri, Feb 8, 2019 at 4:57 PM Jungtaek Lim <kabhwan@gmail.com> wrote:
>>> >
>>> > Let me quote some voices here: seems like they don't participate this
>>> thread. This still doesn't represent the majority are using this pattern,
>>> so I'm also OK to make it optional (I might just work on SPARK-26792 to
>>> address) and leave the default as it is if others aren't interested on this.
>>> >
>>> > https://github.com/apache/spark/pull/23260#issuecomment-456827963
>>> >
>>> > Sorry I haven't had time to look through all the code so this might be
>>> a separate jira, but one thing I thought of here is it would be really nice
>>> not to have specifically stderr/stdout. users can specify any
>>> log4j.properties and some tools like oozie by default end up using hadoop
>>> log4j rather then spark log4j, so files aren't necessarily the same. Also
>>> users can put in other logs files so it would be nice to have links to
>>> those from the UI. It seems simpler if we just had a link to the directory
>>> and it read the files within there. Other things in Hadoop do it this way,
>>> but I'm not sure if that works well for other resource managers, any
>>> thoughts on that? As long as this doesn't prevent the above I can file a
>>> separate jira for it.
>>> >
>>> > https://github.com/apache/spark/pull/23260#issuecomment-456904716
>>> >
>>> > Hi Tom, +1: singling out stdout and stderr is definitely an annoyance.
>>> We
>>> > typically configure Spark jobs to write the GC log and dump heap on OOM
>>> > using <LOG_DIR>, and/or we use the rolling file appender to deal with
>>> > large logs during debugging. So linking the YARN container log overview
>>> > page would make much more sense for us. We work it around with a custom
>>> > submit process that logs all important URLs on the submit side log.
>>> >
>>> >
>>> >
>>> > 2019년 2월 9일 (토) 오전 5:42, Ryan Blue <rblue@netflix.com>님이
작성:
>>> >>
>>> >> Here's what I see from a running job on our cluster. Both of these
>>> are links that go to the stderr and stdout links that Spark produces today.
>>> >>
>>> >> stderr : Total file length is 18557 bytes.
>>> >> stdout : Total file length is 0 bytes.
>>> >>
>>> >> While it is nice to see that stderr or stdout has content, I don't
>>> think that this is worth the extra click or changes to Spark.
>>> >>
>>> >> However, we have configured our logs to go to stderr and stdout so
>>> these links work for us. I think some YARN applications send logs to a
>>> separate log endpoint, which would be useful when listed here. Does anyone
>>> have logs going to locations other than stderr and stdout?
>>> >>
>>> >> If there are logs going to other files, then I think making this an
>>> option is reasonable. Otherwise, I think we should leave links as they are.
>>> >>
>>> >> rb
>>> >>
>>> >> On Thu, Feb 7, 2019 at 12:31 PM Jungtaek Lim <kabhwan@gmail.com>
>>> wrote:
>>> >>>
>>> >>> New URL shows all of local logs which includes stdout and stderr
as
>>> a list.
>>> >>>
>>> >>> The change would help when end users modify their log4j
>>> configuration to have another log files, as well as GC logs. Currently
>>> Spark only shows two static files (stdout, stderr) as individual links so
>>> easier to see the content (one-click) but users have to remove file part
>>> manually from URL to access list page. Instead of this we may be able to
>>> change default URL to show all of local logs and let users choose which
>>> file to read. (though it would be two-clicks to access to actual file)
>>> >>>
>>> >>> -Jungtaek Lim (HeartSaVioR)
>>> >>>
>>> >>> 2019년 2월 8일 (금) 오전 1:33, Ryan Blue <rblue@netflix.com>님이
작성:
>>> >>>>
>>> >>>> Jungtaek,
>>> >>>>
>>> >>>> What is shown at the new URL and how would this improve usability?
>>> >>>>
>>> >>>> On Thu, Feb 7, 2019 at 12:45 AM Jungtaek Lim <kabhwan@gmail.com>
>>> wrote:
>>> >>>>>
>>> >>>>> Hi devs,
>>> >>>>>
>>> >>>>> Based on the suggestion Tom Graves gave me in SPARK-26792,
I'd
>>> like to hear voices on changing default executor log URLs for YARN,
>>> specifically removing "stdout" and "stderr" and provide link which shows
>>> log file"s". For example, instead of referring two links below:
>>> >>>>>
>>> >>>>> http://
>>> <NM_HOST>:<NM_PORT>/node/containerlogs/<CONTAINER_ID>/<USER>/<stdout|stderr>?start=-4096
>>> >>>>>
>>> >>>>> we just refer only one link below:
>>> >>>>>
>>> >>>>> http://
>>> <NM_HOST>:<NM_PORT>/node/containerlogs/<CONTAINER_ID>/<USER>
>>> >>>>>
>>> >>>>> I've checked new URL works with redirection on NM to jobhistory,
>>> so it won't break what we currently supported. Going through the actual log
>>> file would require two clicks instead of one click though.
>>> >>>>>
>>> >>>>> Given it introduces the change on UX I'd like to hear voices
on
>>> this before submitting a patch. If we'd rather keep this as it is, I would
>>> just open the chance to apply custom log URL for Spark UI as well.
>>> >>>>>
>>> >>>>> Thanks in advance!
>>> >>>>>
>>> >>>>> FYI, below is the rationalization on discussion:
>>> >>>>>
>>> >>>>> While I worked regarding SPARK-23155, I've got some inputs
around
>>> linking "log directory" instead of log urls for each "stdout" and "stderr",
>>> because in real case end users would put more files then only stdout and
>>> stderr (like gc logs).
>>> >>>>>
>>> >>>>> SPARK-23155 provides the way to modify log URL but it's
only
>>> applied to SHS, and in Spark UI in running apps it still only shows
>>> "stdout" and "stderr". SPARK-26792 is for applying this to Spark UI as
>>> well, but I've got suggestion to just change the default log URL.
>>> >>>>>
>>> >>>>> Thanks again,
>>> >>>>> Jungtaek Lim (HeartSaVioR)
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Ryan Blue
>>> >>>> Software Engineer
>>> >>>> Netflix
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Ryan Blue
>>> >> Software Engineer
>>> >> Netflix
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Mime
View raw message