spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Park (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-20589) Allow limiting task concurrency per stage
Date Mon, 09 Oct 2017 17:57:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-20589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197402#comment-16197402
] 

Michael Park edited comment on SPARK-20589 at 10/9/17 5:56 PM:
---------------------------------------------------------------

Pardon my ignorance of the inner workings of task scheduling, but is it not possible provide
a way to mark the max-concurrency of a specific RDD? The max concurrency of a stage would
then be the minimum max-concurrency of all RDDs within that stage.

Also, +1 for not being an obscure use-case. We are seeing a need for this any time we attempt
include an external service as part of a generic pipeline. Ideally the bottleneck can be limited
to a single stage, rather then an entire job. 


was (Author: mike.park):
Pardon my ignorance of the inner workings of task scheduling, but is it not possible provide
a way to mark the max-concurrency of a specific RDD? The max concurrency of a stage would
the be the minimum max-concurrency of all RDDs within that stage.

Also, +1 for not being an obscure use-case. We are seeing a need for this any time we attempt
include an external service as part of a generic pipeline. Ideally the bottleneck can be limited
to a single stage, rather then an entire job. 

> Allow limiting task concurrency per stage
> -----------------------------------------
>
>                 Key: SPARK-20589
>                 URL: https://issues.apache.org/jira/browse/SPARK-20589
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>    Affects Versions: 2.1.0
>            Reporter: Thomas Graves
>
> It would be nice to have the ability to limit the number of concurrent tasks per stage.
 This is useful when your spark job might be accessing another service and you don't want
to DOS that service.  For instance Spark writing to hbase or Spark doing http puts on a service.
 Many times you want to do this without limiting the number of partitions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message