spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <>
Subject Re: Stage vs. StageInfo
Date Tue, 23 Jul 2013 23:47:47 GMT
Ah, got it.  So Stage and TaskInfo are opaque outside spark, while
TaskMetrics are visible.

On Tue, Jul 23, 2013 at 4:41 PM, Matei Zaharia <>wrote:

> Hey Mark,
> The motivation was to separate internal DAGScheduler data structures, such
> as Stage, from the interface we'll present to SparkListener, which will be
> a semi-public API. (Semi-public in that it might still change if we make
> drastic changes to the scheduler, but we want people to be able to use it
> for monitoring with as little pain as possible). We aren't following this
> consistently in all the SparkListener events yet but the goal is to do so.
> Matei
> On Jul 23, 2013, at 4:22 PM, Mark Hamstra <> wrote:
> > So I'm currently working in Spark's DAGScheduler and related UI code, and
> > I'm finding myself wondering why there are StageInfos distinct from
> Stages.
> > It seems like we go through some bookkeeping to make sure that we can get
> > from a Stage to a StageInfo, which in turn is just a pairing of the Stage
> > with a collection of (TaskInfo, TaskMetrics) pairs.  Why not avoid the
> > bookkeeping and just put that collection of (TaskInfo, TaskMetrics) pairs
> > right in the Stage itself?  I.e., directly change the Stage class to
> > augment it with the collection instead of indirectly augmenting stages by
> > going through the (potentially error-prone) mechanics of maintaining an
> > association between a StageInfo distinct from the Stage.
> >
> > Or am I missing something?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message