spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabh...@gmail.com>
Subject Re: [DISCUSS] Change default executor log URLs for YARN
Date Fri, 08 Feb 2019 22:57:12 GMT
Let me quote some voices here: seems like they don't participate this
thread. This still doesn't represent the majority are using this pattern,
so I'm also OK to make it optional (I might just work on SPARK-26792
<https://issues.apache.org/jira/browse/SPARK-26792> to address) and leave
the default as it is if others aren't interested on this.

https://github.com/apache/spark/pull/23260#issuecomment-456827963

Sorry I haven't had time to look through all the code so this might be a
separate jira, but one thing I thought of here is it would be really nice
not to have specifically stderr/stdout. users can specify any
log4j.properties and some tools like oozie by default end up using hadoop
log4j rather then spark log4j, so files aren't necessarily the same. Also
users can put in other logs files so it would be nice to have links to
those from the UI. It seems simpler if we just had a link to the directory
and it read the files within there. Other things in Hadoop do it this way,
but I'm not sure if that works well for other resource managers, any
thoughts on that? As long as this doesn't prevent the above I can file a
separate jira for it.

https://github.com/apache/spark/pull/23260#issuecomment-456904716

Hi Tom, +1: singling out stdout and stderr is definitely an annoyance. We
typically configure Spark jobs to write the GC log and dump heap on OOM
using <LOG_DIR>, and/or we use the rolling file appender to deal with
large logs during debugging. So linking the YARN container log overview
page would make much more sense for us. We work it around with a custom
submit process that logs all important URLs on the submit side log.



2019년 2월 9일 (토) 오전 5:42, Ryan Blue <rblue@netflix.com>님이 작성:

> Here's what I see from a running job on our cluster. Both of these are
> links that go to the stderr and stdout links that Spark produces today.
>
> stderr : Total file length is 18557 bytes.
> stdout : Total file length is 0 bytes.
>
> While it is nice to see that stderr or stdout has content, I don't think
> that this is worth the extra click or changes to Spark.
>
> However, we have configured our logs to go to stderr and stdout so these
> links work for us. I think some YARN applications send logs to a separate
> log endpoint, which would be useful when listed here. Does anyone have logs
> going to locations other than stderr and stdout?
>
> If there are logs going to other files, then I think making this an option
> is reasonable. Otherwise, I think we should leave links as they are.
>
> rb
>
> On Thu, Feb 7, 2019 at 12:31 PM Jungtaek Lim <kabhwan@gmail.com> wrote:
>
>> New URL shows all of local logs which includes stdout and stderr as a
>> list.
>>
>> The change would help when end users modify their log4j configuration to
>> have another log files, as well as GC logs. Currently Spark only shows two
>> static files (stdout, stderr) as individual links so easier to see the
>> content (one-click) but users have to remove file part manually from URL to
>> access list page. Instead of this we may be able to change default URL to
>> show all of local logs and let users choose which file to read. (though it
>> would be two-clicks to access to actual file)
>>
>> -Jungtaek Lim (HeartSaVioR)
>>
>> 2019년 2월 8일 (금) 오전 1:33, Ryan Blue <rblue@netflix.com>님이 작성:
>>
>>> Jungtaek,
>>>
>>> What is shown at the new URL and how would this improve usability?
>>>
>>> On Thu, Feb 7, 2019 at 12:45 AM Jungtaek Lim <kabhwan@gmail.com> wrote:
>>>
>>>> Hi devs,
>>>>
>>>> Based on the suggestion Tom Graves gave me in SPARK-26792
>>>> <https://issues.apache.org/jira/browse/SPARK-26792>, I'd like to hear
>>>> voices on changing default executor log URLs for YARN, specifically
>>>> removing "stdout" and "stderr" and provide link which shows log file"s".
>>>> For example, instead of referring two links below:
>>>>
>>>> http://
>>>> <NM_HOST>:<NM_PORT>/node/containerlogs/<CONTAINER_ID>/<USER>/<stdout|stderr>?start=-4096
>>>>
>>>> we just refer only one link below:
>>>>
>>>> http://<NM_HOST>:<NM_PORT>/node/containerlogs/<CONTAINER_ID>/<USER>
>>>>
>>>> I've checked new URL works with redirection on NM to jobhistory, so it
>>>> won't break what we currently supported. Going through the actual log file
>>>> would require two clicks instead of one click though.
>>>>
>>>> Given it introduces the change on UX I'd like to hear voices on this
>>>> before submitting a patch. If we'd rather keep this as it is, I would just
>>>> open the chance to apply custom log URL for Spark UI as well.
>>>>
>>>> Thanks in advance!
>>>>
>>>> FYI, below is the rationalization on discussion:
>>>>
>>>> While I worked regarding SPARK-23155
>>>> <https://issues.apache.org/jira/browse/SPARK-23155>, I've got some
>>>> inputs around linking "log directory" instead of log urls for each "stdout"
>>>> and "stderr", because in real case end users would put more files then only
>>>> stdout and stderr (like gc logs).
>>>>
>>>> SPARK-23155 provides the way to modify log URL but it's only applied to
>>>> SHS, and in Spark UI in running apps it still only shows "stdout" and
>>>> "stderr". SPARK-26792 is for applying this to Spark UI as well, but I've
>>>> got suggestion to just change the default log URL.
>>>>
>>>> Thanks again,
>>>> Jungtaek Lim (HeartSaVioR)
>>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Mime
View raw message