spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <rb...@netflix.com.INVALID>
Subject Re: [DISCUSS] Change default executor log URLs for YARN
Date Fri, 08 Feb 2019 23:24:08 GMT
Jungtaek,

Thanks for the extra context. Those quotes are the confirmation that I was
looking for to expose the link you suggest instead of going directly to
stderr and stdout.

What do you think about my suggestion to change this with a config option?
I would prefer that since we use the supported pattern. But I would support
moving forward on this either way.

rb

On Fri, Feb 8, 2019 at 3:03 PM Sean Owen <srowen@gmail.com> wrote:

> I think that's a reasonable argument, that it provides links to
> potentially several logs of interest. It reduces the UI clutter a
> little at the cost of one more hop to get to logs.
> I don't feel strongly about it but think that's a reasonable thing to do.
>
> On Fri, Feb 8, 2019 at 4:57 PM Jungtaek Lim <kabhwan@gmail.com> wrote:
> >
> > Let me quote some voices here: seems like they don't participate this
> thread. This still doesn't represent the majority are using this pattern,
> so I'm also OK to make it optional (I might just work on SPARK-26792 to
> address) and leave the default as it is if others aren't interested on this.
> >
> > https://github.com/apache/spark/pull/23260#issuecomment-456827963
> >
> > Sorry I haven't had time to look through all the code so this might be a
> separate jira, but one thing I thought of here is it would be really nice
> not to have specifically stderr/stdout. users can specify any
> log4j.properties and some tools like oozie by default end up using hadoop
> log4j rather then spark log4j, so files aren't necessarily the same. Also
> users can put in other logs files so it would be nice to have links to
> those from the UI. It seems simpler if we just had a link to the directory
> and it read the files within there. Other things in Hadoop do it this way,
> but I'm not sure if that works well for other resource managers, any
> thoughts on that? As long as this doesn't prevent the above I can file a
> separate jira for it.
> >
> > https://github.com/apache/spark/pull/23260#issuecomment-456904716
> >
> > Hi Tom, +1: singling out stdout and stderr is definitely an annoyance. We
> > typically configure Spark jobs to write the GC log and dump heap on OOM
> > using <LOG_DIR>, and/or we use the rolling file appender to deal with
> > large logs during debugging. So linking the YARN container log overview
> > page would make much more sense for us. We work it around with a custom
> > submit process that logs all important URLs on the submit side log.
> >
> >
> >
> > 2019년 2월 9일 (토) 오전 5:42, Ryan Blue <rblue@netflix.com>님이 작성:
> >>
> >> Here's what I see from a running job on our cluster. Both of these are
> links that go to the stderr and stdout links that Spark produces today.
> >>
> >> stderr : Total file length is 18557 bytes.
> >> stdout : Total file length is 0 bytes.
> >>
> >> While it is nice to see that stderr or stdout has content, I don't
> think that this is worth the extra click or changes to Spark.
> >>
> >> However, we have configured our logs to go to stderr and stdout so
> these links work for us. I think some YARN applications send logs to a
> separate log endpoint, which would be useful when listed here. Does anyone
> have logs going to locations other than stderr and stdout?
> >>
> >> If there are logs going to other files, then I think making this an
> option is reasonable. Otherwise, I think we should leave links as they are.
> >>
> >> rb
> >>
> >> On Thu, Feb 7, 2019 at 12:31 PM Jungtaek Lim <kabhwan@gmail.com> wrote:
> >>>
> >>> New URL shows all of local logs which includes stdout and stderr as a
> list.
> >>>
> >>> The change would help when end users modify their log4j configuration
> to have another log files, as well as GC logs. Currently Spark only shows
> two static files (stdout, stderr) as individual links so easier to see the
> content (one-click) but users have to remove file part manually from URL to
> access list page. Instead of this we may be able to change default URL to
> show all of local logs and let users choose which file to read. (though it
> would be two-clicks to access to actual file)
> >>>
> >>> -Jungtaek Lim (HeartSaVioR)
> >>>
> >>> 2019년 2월 8일 (금) 오전 1:33, Ryan Blue <rblue@netflix.com>님이
작성:
> >>>>
> >>>> Jungtaek,
> >>>>
> >>>> What is shown at the new URL and how would this improve usability?
> >>>>
> >>>> On Thu, Feb 7, 2019 at 12:45 AM Jungtaek Lim <kabhwan@gmail.com>
> wrote:
> >>>>>
> >>>>> Hi devs,
> >>>>>
> >>>>> Based on the suggestion Tom Graves gave me in SPARK-26792, I'd like
> to hear voices on changing default executor log URLs for YARN, specifically
> removing "stdout" and "stderr" and provide link which shows log file"s".
> For example, instead of referring two links below:
> >>>>>
> >>>>> http://
> <NM_HOST>:<NM_PORT>/node/containerlogs/<CONTAINER_ID>/<USER>/<stdout|stderr>?start=-4096
> >>>>>
> >>>>> we just refer only one link below:
> >>>>>
> >>>>> http://<NM_HOST>:<NM_PORT>/node/containerlogs/<CONTAINER_ID>/<USER>
> >>>>>
> >>>>> I've checked new URL works with redirection on NM to jobhistory,
so
> it won't break what we currently supported. Going through the actual log
> file would require two clicks instead of one click though.
> >>>>>
> >>>>> Given it introduces the change on UX I'd like to hear voices on
this
> before submitting a patch. If we'd rather keep this as it is, I would just
> open the chance to apply custom log URL for Spark UI as well.
> >>>>>
> >>>>> Thanks in advance!
> >>>>>
> >>>>> FYI, below is the rationalization on discussion:
> >>>>>
> >>>>> While I worked regarding SPARK-23155, I've got some inputs around
> linking "log directory" instead of log urls for each "stdout" and "stderr",
> because in real case end users would put more files then only stdout and
> stderr (like gc logs).
> >>>>>
> >>>>> SPARK-23155 provides the way to modify log URL but it's only applied
> to SHS, and in Spark UI in running apps it still only shows "stdout" and
> "stderr". SPARK-26792 is for applying this to Spark UI as well, but I've
> got suggestion to just change the default log URL.
> >>>>>
> >>>>> Thanks again,
> >>>>> Jungtaek Lim (HeartSaVioR)
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Ryan Blue
> >>>> Software Engineer
> >>>> Netflix
> >>
> >>
> >>
> >> --
> >> Ryan Blue
> >> Software Engineer
> >> Netflix
>


-- 
Ryan Blue
Software Engineer
Netflix

Mime
View raw message