spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nithin Asokan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-10897) Custom job/stage names
Date Thu, 01 Oct 2015 18:08:29 GMT

    [ https://issues.apache.org/jira/browse/SPARK-10897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940157#comment-14940157
] 

Nithin Asokan commented on SPARK-10897:
---------------------------------------

{quote}
For example if groupBy results in 3 stages, which one gets the name? if 3 method calls result
in 1 stage?I don't think it's impossible but not sure about the details of the semantics.
{quote}
This is a good point. I did not think of this scenario in my mind. 

{quote}
is the motivation really to just display something farther up the call stack?
{quote}
Yes, Crunch has a concept of DoFn which is similar to Function in spark. These DoFn's can
take names that are usually displayed on a Job page in MR. I should not be comparing MR to
Spark, but in my use case; we are migrating from MR to Spark. And our engineers are familiar
with how crunch creates a MR job that has a nice job name which includes all DoFn name; this
give more context to a user as what the job is processing. For example: In MR crunch can create
a job name like this {{MyPipeline: Text("/input/path")+Filter valid lines+Text("/output/path")}}.
In case of Spark, we are missing that information. I believe partly because Spark scheduler
handles stage and job creation. A Spark job/stage name may appear as

{code}
sortByKey at PGroupedTableImpl.java:123 (job name)
mapToPair at PGroupedTableImpl.java:108 (stage name)
{code}

While this gives idea that it's processing/creating a PGroupedTable, it does not give me full
context(atleast through Crunch) of DoFn applied. If Spark allows users to set Stage names,
I think we can pass some DoFn information from Crunch. The next thing I would ask myself would
be, if Crunch does not know what stages are created, how can it know which DoFn name to pass
to Spark? 

I'm not fully sure if this can be supported because of my less knowledge in Spark, but if
other feels it's possible it could be something that will be helpful for Crunch. 

> Custom job/stage names
> ----------------------
>
>                 Key: SPARK-10897
>                 URL: https://issues.apache.org/jira/browse/SPARK-10897
>             Project: Spark
>          Issue Type: Wish
>          Components: Web UI
>            Reporter: Nithin Asokan
>            Priority: Minor
>
> Logging this jira to get some opinion about discussion I started on [user-list|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Job-Stage-names-tt24867.html]
> I would like to get some thoughts about having custom stage/job names. Currently I believe
the stage names cannot be controlled by user, but if allowed we can have libraries like Apache
[Crunch|https://crunch.apache.org/] to dynamically set stage names based on the type of processing(action/transformation)
it is performing. 
> Is it possible for Spark to support custom names? Will it make sense to allow users set
stage names?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message