spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Sukmanowsky <mike.sukmanow...@gmail.com>
Subject Re: Spark Metrics Framework?
Date Fri, 01 Apr 2016 18:09:58 GMT
Thanks Silvio, JIRA submitted
https://issues.apache.org/jira/browse/SPARK-14332.

On Fri, 25 Mar 2016 at 12:46 Silvio Fiorito <silvio.fiorito@granturing.com>
wrote:

> Hi Mike,
>
> Sorry got swamped with work and didn’t get a chance to reply.
>
> I misunderstood what you were trying to do. I thought you were just
> looking to create custom metrics vs looking for the existing Hadoop Output
> Format counters.
>
> I’m not familiar enough with the Hadoop APIs but I think it would require
> a change to the SparkHadoopWriter
> <https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala>
> class since it generates the JobContext which is required to read the
> counters. Then it could publish the counters to the Spark metrics system.
>
> I would suggest going ahead and submitting a JIRA request if there isn’t
> one already.
>
> Thanks,
> Silvio
>
> From: Mike Sukmanowsky <mike.sukmanowsky@gmail.com>
> Date: Friday, March 25, 2016 at 10:48 AM
>
> To: Silvio Fiorito <silvio.fiorito@granturing.com>, "user@spark.apache.org"
> <user@spark.apache.org>
> Subject: Re: Spark Metrics Framework?
>
> Pinging again - any thoughts?
>
> On Wed, 23 Mar 2016 at 09:17 Mike Sukmanowsky <mike.sukmanowsky@gmail.com>
> wrote:
>
>> Thanks Ted and Silvio. I think I'll need a bit more hand holding here,
>> sorry. The way we use ES Hadoop is in pyspark via
>> org.elasticsearch.hadoop.mr.EsOutputFormat in a saveAsNewAPIHadoopFile
>> call. Given the Hadoop interop, I wouldn't assume that the EsOutputFormat
>> class
>> <https://github.com/elastic/elasticsearch-hadoop/blob/master/mr/src/main/java/org/elasticsearch/hadoop/mr/EsOutputFormat.java>
>> could be modified to define a new Source and register it via
>> MetricsSystem.createMetricsSystem. This feels like a good feature request
>> for Spark actually: "Support Hadoop Counters in Input/OutputFormats as
>> Spark metrics" but I wanted some feedback first to see if that makes sense.
>>
>> That said, some of the custom RDD classes
>> <https://github.com/elastic/elasticsearch-hadoop/tree/master/spark/core/main/scala/org/elasticsearch/spark/rdd>
could
>> probably be modified to register a new Source when they perform
>> reading/writing from/to Elasticsearch.
>>
>> On Tue, 22 Mar 2016 at 15:17 Silvio Fiorito <
>> silvio.fiorito@granturing.com> wrote:
>>
>>> Hi Mike,
>>>
>>> It’s been a while since I worked on a custom Source but I think all you
>>> need to do is make your Source in the org.apache.spark package.
>>>
>>> Thanks,
>>> Silvio
>>>
>>> From: Mike Sukmanowsky <mike.sukmanowsky@gmail.com>
>>> Date: Tuesday, March 22, 2016 at 3:13 PM
>>> To: Silvio Fiorito <silvio.fiorito@granturing.com>, "
>>> user@spark.apache.org" <user@spark.apache.org>
>>> Subject: Re: Spark Metrics Framework?
>>>
>>> The Source class is private
>>> <https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/metrics/source/Source.scala#L22-L25>
>>> to the spark package and any new Sources added to the metrics registry must
>>> be of type Source
>>> <https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala#L144-L152>.
>>> So unless I'm mistaken, we can't define a custom source. I linked to 1.4.1
>>> code, but the same is true in 1.6.1.
>>>
>>> On Mon, 21 Mar 2016 at 12:05 Silvio Fiorito <
>>> silvio.fiorito@granturing.com> wrote:
>>>
>>>> You could use the metric sources and sinks described here:
>>>> http://spark.apache.org/docs/latest/monitoring.html#metrics
>>>>
>>>> If you want to push the metrics to another system you can define a
>>>> custom sink. You can also extend the metrics by defining a custom source.
>>>>
>>>> From: Mike Sukmanowsky <mike.sukmanowsky@gmail.com>
>>>> Date: Monday, March 21, 2016 at 11:54 AM
>>>> To: "user@spark.apache.org" <user@spark.apache.org>
>>>> Subject: Spark Metrics Framework?
>>>>
>>>> We make extensive use of the elasticsearch-hadoop library for
>>>> Hadoop/Spark. In trying to troubleshoot our Spark applications, it'd be
>>>> very handy to have access to some of the many metrics
>>>> <https://www.elastic.co/guide/en/elasticsearch/hadoop/current/metrics.html>
>>>> that the library makes available when running in map reduce mode. The library's
>>>> author noted
>>>> <https://discuss.elastic.co/t/access-es-hadoop-stats-from-spark/44913>
>>>> that Spark doesn't offer any kind of a similar metrics API where by these
>>>> metrics could be reported or aggregated on.
>>>>
>>>> Are there any plans to bring a metrics framework similar to Hadoop's
>>>> Counter system to Spark or is there an alternative means for us to grab
>>>> metrics exposed when using Hadoop APIs to load/save RDDs?
>>>>
>>>> Thanks,
>>>> Mike
>>>>
>>>

Mime
View raw message