spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yiannis Gkoufas <johngou...@gmail.com>
Subject Re: Spark Metrics Framework?
Date Fri, 01 Apr 2016 19:13:51 GMT
Hi Mike,

I am forwarding you a mail I sent a while ago regarding some related work I
did, hope you find it useful

Hi all,

I recently sent to the dev mailing list about this contribution, but I
thought it might be useful to post it here, since I have seen a lot of
people asking about OS-level metrics of Spark. This is the result of the
work we have been doing recently in IBM Research around Spark.

Essentially, we have extended Spark metrics system to utilize Hyperic Sigar
library to capture OS-level metrics and modified the Web UI to visualize
those metrics per application.

The above functionalities can be configured in the metrics.properties and
spark-defaults.conf files.

We have recorded a small demo that shows those capabilities which you can
find here :https
<https://ibm.app.box.com/s/vyaedlyb444a4zna1215c7puhxliqxdg>://
<https://ibm.app.box.com/s/vyaedlyb444a4zna1215c7puhxliqxdg>ibm.app.box.com
<https://ibm.app.box.com/s/vyaedlyb444a4zna1215c7puhxliqxdg>
/s/vyaedlyb444a4zna1215c7puhxliqxdg
<https://ibm.app.box.com/s/vyaedlyb444a4zna1215c7puhxliqxdg>

There is a blog post which gives more details on the functionality here:
www.spark.tc
<http://www.spark.tc/sparkoscope-enabling-spark-optimization-through-cross-stack-monitoring-and-visualization-2/>
/
<http://www.spark.tc/sparkoscope-enabling-spark-optimization-through-cross-stack-monitoring-and-visualization-2/>
sparkoscope-enabling-spark-optimization-through-
<http://www.spark.tc/sparkoscope-enabling-spark-optimization-through-cross-stack-monitoring-and-visualization-2/>
cross-stack-monitoring-and-visualization-2
<http://www.spark.tc/sparkoscope-enabling-spark-optimization-through-cross-stack-monitoring-and-visualization-2/>
/
<http://www.spark.tc/sparkoscope-enabling-spark-optimization-through-cross-stack-monitoring-and-visualization-2/>

and also there is a public repo where anyone can try it: https
<https://github.com/ibm-research-ireland/sparkoscope>://
<https://github.com/ibm-research-ireland/sparkoscope>github.com
<https://github.com/ibm-research-ireland/sparkoscope>/
<https://github.com/ibm-research-ireland/sparkoscope>ibm-research-ireland
<https://github.com/ibm-research-ireland/sparkoscope>/
<https://github.com/ibm-research-ireland/sparkoscope>sparkoscope
<https://github.com/ibm-research-ireland/sparkoscope>

Hope someone finds it useful!

Thanks a lot!

Yiannis
On 1 Apr 2016 19:10, "Mike Sukmanowsky" <mike.sukmanowsky@gmail.com> wrote:

> Thanks Silvio, JIRA submitted
> https://issues.apache.org/jira/browse/SPARK-14332.
>
> On Fri, 25 Mar 2016 at 12:46 Silvio Fiorito <silvio.fiorito@granturing.com>
> wrote:
>
>> Hi Mike,
>>
>> Sorry got swamped with work and didn’t get a chance to reply.
>>
>> I misunderstood what you were trying to do. I thought you were just
>> looking to create custom metrics vs looking for the existing Hadoop Output
>> Format counters.
>>
>> I’m not familiar enough with the Hadoop APIs but I think it would require
>> a change to the SparkHadoopWriter
>> <https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala>
>> class since it generates the JobContext which is required to read the
>> counters. Then it could publish the counters to the Spark metrics system.
>>
>> I would suggest going ahead and submitting a JIRA request if there isn’t
>> one already.
>>
>> Thanks,
>> Silvio
>>
>> From: Mike Sukmanowsky <mike.sukmanowsky@gmail.com>
>> Date: Friday, March 25, 2016 at 10:48 AM
>>
>> To: Silvio Fiorito <silvio.fiorito@granturing.com>, "
>> user@spark.apache.org" <user@spark.apache.org>
>> Subject: Re: Spark Metrics Framework?
>>
>> Pinging again - any thoughts?
>>
>> On Wed, 23 Mar 2016 at 09:17 Mike Sukmanowsky <mike.sukmanowsky@gmail.com>
>> wrote:
>>
>>> Thanks Ted and Silvio. I think I'll need a bit more hand holding here,
>>> sorry. The way we use ES Hadoop is in pyspark via
>>> org.elasticsearch.hadoop.mr.EsOutputFormat in a saveAsNewAPIHadoopFile
>>> call. Given the Hadoop interop, I wouldn't assume that the EsOutputFormat
>>> class
>>> <https://github.com/elastic/elasticsearch-hadoop/blob/master/mr/src/main/java/org/elasticsearch/hadoop/mr/EsOutputFormat.java>
>>> could be modified to define a new Source and register it via
>>> MetricsSystem.createMetricsSystem. This feels like a good feature request
>>> for Spark actually: "Support Hadoop Counters in Input/OutputFormats as
>>> Spark metrics" but I wanted some feedback first to see if that makes sense.
>>>
>>> That said, some of the custom RDD classes
>>> <https://github.com/elastic/elasticsearch-hadoop/tree/master/spark/core/main/scala/org/elasticsearch/spark/rdd>
could
>>> probably be modified to register a new Source when they perform
>>> reading/writing from/to Elasticsearch.
>>>
>>> On Tue, 22 Mar 2016 at 15:17 Silvio Fiorito <
>>> silvio.fiorito@granturing.com> wrote:
>>>
>>>> Hi Mike,
>>>>
>>>> It’s been a while since I worked on a custom Source but I think all you
>>>> need to do is make your Source in the org.apache.spark package.
>>>>
>>>> Thanks,
>>>> Silvio
>>>>
>>>> From: Mike Sukmanowsky <mike.sukmanowsky@gmail.com>
>>>> Date: Tuesday, March 22, 2016 at 3:13 PM
>>>> To: Silvio Fiorito <silvio.fiorito@granturing.com>, "
>>>> user@spark.apache.org" <user@spark.apache.org>
>>>> Subject: Re: Spark Metrics Framework?
>>>>
>>>> The Source class is private
>>>> <https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/metrics/source/Source.scala#L22-L25>
>>>> to the spark package and any new Sources added to the metrics registry must
>>>> be of type Source
>>>> <https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala#L144-L152>.
>>>> So unless I'm mistaken, we can't define a custom source. I linked to 1.4.1
>>>> code, but the same is true in 1.6.1.
>>>>
>>>> On Mon, 21 Mar 2016 at 12:05 Silvio Fiorito <
>>>> silvio.fiorito@granturing.com> wrote:
>>>>
>>>>> You could use the metric sources and sinks described here:
>>>>> http://spark.apache.org/docs/latest/monitoring.html#metrics
>>>>>
>>>>> If you want to push the metrics to another system you can define a
>>>>> custom sink. You can also extend the metrics by defining a custom source.
>>>>>
>>>>> From: Mike Sukmanowsky <mike.sukmanowsky@gmail.com>
>>>>> Date: Monday, March 21, 2016 at 11:54 AM
>>>>> To: "user@spark.apache.org" <user@spark.apache.org>
>>>>> Subject: Spark Metrics Framework?
>>>>>
>>>>> We make extensive use of the elasticsearch-hadoop library for
>>>>> Hadoop/Spark. In trying to troubleshoot our Spark applications, it'd
be
>>>>> very handy to have access to some of the many metrics
>>>>> <https://www.elastic.co/guide/en/elasticsearch/hadoop/current/metrics.html>
>>>>> that the library makes available when running in map reduce mode. The
library's
>>>>> author noted
>>>>> <https://discuss.elastic.co/t/access-es-hadoop-stats-from-spark/44913>
>>>>> that Spark doesn't offer any kind of a similar metrics API where by these
>>>>> metrics could be reported or aggregated on.
>>>>>
>>>>> Are there any plans to bring a metrics framework similar to Hadoop's
>>>>> Counter system to Spark or is there an alternative means for us to grab
>>>>> metrics exposed when using Hadoop APIs to load/save RDDs?
>>>>>
>>>>> Thanks,
>>>>> Mike
>>>>>
>>>>

Mime
View raw message