crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Surbhi Mungre (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-558) Add name to Spark Accumulators
Date Fri, 04 Sep 2015 18:53:45 GMT


Surbhi Mungre commented on CRUNCH-558:

You should consider adding a configuration to opt in and out of displaying counters on Spark
UI. I might be wrong but going by [1] it feels like displaying accumulators can effect performance
of an application. I am not sure what happens when size of the Map which stores counters become
too large. I don't know if will we really see any noticeable difference or not. 

In addition, just adding a name to your accumulator also has the side-effect that the UI will
call .toString on the accumulator update from each task. So if you did use an accumulator
on a more complex type with an expensive .toString, just giving the accumulator a name could
destroy performance. We’re left with the strange advice to users: if you are just using
a counter, make sure you add a name to your counter; but if it’s something more complicated
than a counter, be sure you do not add a name.


> Add name to Spark Accumulators
> ------------------------------
>                 Key: CRUNCH-558
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Micah Whitacre
>            Assignee: Micah Whitacre
>             Fix For: 0.14.0
>         Attachments: CRUNCH-558.patch
> It was brought up on the mailing list that our Crunch counters are not showing up on
the Spark webui possibly because they are not named.
> {quote}
> We are currently testing a few capabilities using Spark and one thing we noticed in Spark
is they don't list any user defined accumulators on web UI. 
> On MapReduce I would imagine counters being displayed on the job page, however on a SparkPipeline
I was only able to pull counter information from PipelineResult#getStageResult(). 
> I think the reason these accumulators are not visible on web UI is because crunch does
not name these accumulators. Spark expects an accumulator to have a name to be visible on
the UI.
(accumulator API with Name)
> I would like to know if it's possible in crunch to name these accumulators so they are
available in web UI. This will give us an experience where users can monitor/watch accumulators
from web UI to obtain key information about their jobs. 
> {quote}

This message was sent by Atlassian JIRA

View raw message