spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <>
Subject Stage vs. StageInfo
Date Tue, 23 Jul 2013 23:22:50 GMT
So I'm currently working in Spark's DAGScheduler and related UI code, and
I'm finding myself wondering why there are StageInfos distinct from Stages.
 It seems like we go through some bookkeeping to make sure that we can get
from a Stage to a StageInfo, which in turn is just a pairing of the Stage
with a collection of (TaskInfo, TaskMetrics) pairs.  Why not avoid the
bookkeeping and just put that collection of (TaskInfo, TaskMetrics) pairs
right in the Stage itself?  I.e., directly change the Stage class to
augment it with the collection instead of indirectly augmenting stages by
going through the (potentially error-prone) mechanics of maintaining an
association between a StageInfo distinct from the Stage.

Or am I missing something?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message