spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Maas <gerard.m...@gmail.com>
Subject Re: [Streaming] Configure executor logging on Mesos
Date Tue, 02 Jun 2015 00:28:15 GMT
Hi Tim,
(added dev, removed user)

I've created https://issues.apache.org/jira/browse/SPARK-8009 to track this.

-kr, Gerard.

On Sat, May 30, 2015 at 7:10 PM, Tim Chen <tim@mesosphere.io> wrote:

> So sounds like some generic downloadable uris support can solve this
> problem, that Mesos automatically places in your sandbox and you can refer
> to it.
>
> If so please file a jira and this is a pretty simple fix on the Spark side.
>
> Tim
>
> On Sat, May 30, 2015 at 7:34 AM, andy petrella <andy.petrella@gmail.com>
> wrote:
>
>> Hello,
>>
>> I'm currently exploring DCOS for the spark notebook, and while looking at
>> the spark configuration I found something interesting which is actually
>> converging to what we've discovered:
>>
>> https://github.com/mesosphere/universe/blob/master/repo/packages/S/spark/0/marathon.json
>>
>> So the logging is working fine here because the spark package is using
>> the spark-class which is able to configure the log4j file. But the
>> interesting part comes with the fact that the `uris` parameter is filled in
>> with a downloadable path to the log4j file!
>>
>> However, it's not possible when creating the spark context ourselfves and
>> relying on  the mesos sheduler backend only. Unles the spark.executor.uri
>> (or a another one) can take more than one downloadable path.
>>
>> my.2ยข
>>
>> andy
>>
>>
>> On Fri, May 29, 2015 at 5:09 PM Gerard Maas <gerard.maas@gmail.com>
>> wrote:
>>
>>> Hi Tim,
>>>
>>> Thanks for the info.   We (Andy Petrella and myself) have been diving a
>>> bit deeper into this log config:
>>>
>>> The log line I was referring to is this one (sorry, I provided the
>>> others just for context)
>>>
>>> *Using Spark's default log4j profile:
>>> org/apache/spark/log4j-defaults.properties*
>>>
>>> That line comes from Logging.scala [1] where a default config is loaded
>>> is none is found in the classpath upon the startup of the Spark Mesos
>>> executor in the Mesos sandbox. At that point in time, none of the
>>> application-specific resources have been shipped yet as the executor JVM is
>>> just starting up.   To load a custom configuration file we should have it
>>> already on the sandbox before the executor JVM starts and add it to the
>>> classpath on the startup command. Is that correct?
>>>
>>> For the classpath customization, It looks like it should be possible to
>>> pass a -Dlog4j.configuration  property by using the
>>> 'spark.executor.extraClassPath' that will be picked up at [2] and that
>>> should be added to the command that starts the executor JVM, but the
>>> resource must be already on the host before we can do that. Therefore we
>>> also need some means of 'shipping' the log4j.configuration file to the
>>> allocated executor.
>>>
>>> This all boils down to your statement on the need of shipping extra
>>> files to the sandbox. Bottom line: It's currently not possible to specify a
>>> config file for your mesos executor. (ours grows several GB/day).
>>>
>>> The only workaround I found so far is to open up the Spark assembly,
>>> replace the log4j-default.properties and pack it up again.  That would
>>> work, although kind of rudimentary as we use the same assembly for many
>>> jobs.  Probably, accessing the log4j API programmatically should also work
>>> (I didn't try that yet)
>>>
>>> Should we open a JIRA for this functionality?
>>>
>>> -kr, Gerard.
>>>
>>>
>>>
>>>
>>> [1]
>>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/Logging.scala#L128
>>> [2]
>>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala#L77
>>>
>>> On Thu, May 28, 2015 at 7:50 PM, Tim Chen <tim@mesosphere.io> wrote:
>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Tim Chen <tim@mesosphere.io>
>>>> Date: Thu, May 28, 2015 at 10:49 AM
>>>> Subject: Re: [Streaming] Configure executor logging on Mesos
>>>> To: Gerard Maas <gerard.maas@gmail.com>
>>>>
>>>>
>>>> Hi Gerard,
>>>>
>>>> The log line you referred to is not Spark logging but Mesos own
>>>> logging, which is using glog.
>>>>
>>>> Our own executor logs should only contain very few lines though.
>>>>
>>>> Most of the log lines you'll see is from Spark, and it can be controled
>>>> by specifiying a log4j.properties to be downloaded with your Mesos task.
>>>> Alternatively if you are downloading Spark executor via spark.executor.uri,
>>>> you can include log4j.properties in that tar ball.
>>>>
>>>> I think we probably need some more configurations for Spark scheduler
>>>> to pick up extra files to be downloaded into the sandbox.
>>>>
>>>> Tim
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, May 28, 2015 at 6:46 AM, Gerard Maas <gerard.maas@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm trying to control the verbosity of the logs on the Mesos executors
>>>>> with no luck so far. The default behaviour is INFO on stderr dump with
an
>>>>> unbounded growth that gets too big at some point.
>>>>>
>>>>> I noticed that when the executor is instantiated, it locates a default
>>>>> log configuration in the spark assembly:
>>>>>
>>>>> I0528 13:36:22.958067 26890 exec.cpp:206] Executor registered on slave
>>>>> 20150528-063307-780930314-5050-8152-S5
>>>>> Spark assembly has been built with Hive, including Datanucleus jars on
>>>>> classpath
>>>>> Using Spark's default log4j profile:
>>>>> org/apache/spark/log4j-defaults.properties
>>>>>
>>>>> So, no matter what I provide in my job jar files (or also tried with
>>>>> (spark.executor.extraClassPath=log4j.properties) takes effect in the
>>>>> executor's configuration.
>>>>>
>>>>> How should I configure the log on the executors?
>>>>>
>>>>> thanks, Gerard.
>>>>>
>>>>
>>>>
>>>>
>>>
>

Mime
View raw message