spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Sukmanowsky <mike.sukmanow...@gmail.com>
Subject Re: Spark Metrics Framework?
Date Fri, 25 Mar 2016 14:48:05 GMT
Pinging again - any thoughts?

On Wed, 23 Mar 2016 at 09:17 Mike Sukmanowsky <mike.sukmanowsky@gmail.com>
wrote:

> Thanks Ted and Silvio. I think I'll need a bit more hand holding here,
> sorry. The way we use ES Hadoop is in pyspark via
> org.elasticsearch.hadoop.mr.EsOutputFormat in a saveAsNewAPIHadoopFile
> call. Given the Hadoop interop, I wouldn't assume that the EsOutputFormat
> class
> <https://github.com/elastic/elasticsearch-hadoop/blob/master/mr/src/main/java/org/elasticsearch/hadoop/mr/EsOutputFormat.java>
> could be modified to define a new Source and register it via
> MetricsSystem.createMetricsSystem. This feels like a good feature request
> for Spark actually: "Support Hadoop Counters in Input/OutputFormats as
> Spark metrics" but I wanted some feedback first to see if that makes sense.
>
> That said, some of the custom RDD classes
> <https://github.com/elastic/elasticsearch-hadoop/tree/master/spark/core/main/scala/org/elasticsearch/spark/rdd>
could
> probably be modified to register a new Source when they perform
> reading/writing from/to Elasticsearch.
>
> On Tue, 22 Mar 2016 at 15:17 Silvio Fiorito <silvio.fiorito@granturing.com>
> wrote:
>
>> Hi Mike,
>>
>> It’s been a while since I worked on a custom Source but I think all you
>> need to do is make your Source in the org.apache.spark package.
>>
>> Thanks,
>> Silvio
>>
>> From: Mike Sukmanowsky <mike.sukmanowsky@gmail.com>
>> Date: Tuesday, March 22, 2016 at 3:13 PM
>> To: Silvio Fiorito <silvio.fiorito@granturing.com>, "
>> user@spark.apache.org" <user@spark.apache.org>
>> Subject: Re: Spark Metrics Framework?
>>
>> The Source class is private
>> <https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/metrics/source/Source.scala#L22-L25>
>> to the spark package and any new Sources added to the metrics registry must
>> be of type Source
>> <https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala#L144-L152>.
>> So unless I'm mistaken, we can't define a custom source. I linked to 1.4.1
>> code, but the same is true in 1.6.1.
>>
>> On Mon, 21 Mar 2016 at 12:05 Silvio Fiorito <
>> silvio.fiorito@granturing.com> wrote:
>>
>>> You could use the metric sources and sinks described here:
>>> http://spark.apache.org/docs/latest/monitoring.html#metrics
>>>
>>> If you want to push the metrics to another system you can define a
>>> custom sink. You can also extend the metrics by defining a custom source.
>>>
>>> From: Mike Sukmanowsky <mike.sukmanowsky@gmail.com>
>>> Date: Monday, March 21, 2016 at 11:54 AM
>>> To: "user@spark.apache.org" <user@spark.apache.org>
>>> Subject: Spark Metrics Framework?
>>>
>>> We make extensive use of the elasticsearch-hadoop library for
>>> Hadoop/Spark. In trying to troubleshoot our Spark applications, it'd be
>>> very handy to have access to some of the many metrics
>>> <https://www.elastic.co/guide/en/elasticsearch/hadoop/current/metrics.html>
>>> that the library makes available when running in map reduce mode. The library's
>>> author noted
>>> <https://discuss.elastic.co/t/access-es-hadoop-stats-from-spark/44913>
>>> that Spark doesn't offer any kind of a similar metrics API where by these
>>> metrics could be reported or aggregated on.
>>>
>>> Are there any plans to bring a metrics framework similar to Hadoop's
>>> Counter system to Spark or is there an alternative means for us to grab
>>> metrics exposed when using Hadoop APIs to load/save RDDs?
>>>
>>> Thanks,
>>> Mike
>>>
>>

Mime
View raw message