spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Albert Cheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-15429) When `spark.streaming.concurrentJobs > 1`, PIDRateEstimator cannot estimate the receiving rate accurately.
Date Fri, 20 May 2016 07:29:13 GMT

    [ https://issues.apache.org/jira/browse/SPARK-15429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292908#comment-15292908
] 

Albert Cheng commented on SPARK-15429:
--------------------------------------

I have a idea about this issue.

First, add a new parameter `concurrentJobs` to PIDRateEstimator.
Second, We can change the `error = latestRate - processingRate` to `error = latestRate - processingRate
* concurrentJobs.toDouble`. And change the `historicalError = schedulingDelay.toDouble * processingRate
/ batchIntervalMillis` to `historicalError = schedulingDelay.toDouble * processingRate * concurrentJobs.toDouble
/ batchIntervalMillis`.

Is it right?
I would like to fix this. 

> When `spark.streaming.concurrentJobs > 1`, PIDRateEstimator cannot estimate the receiving
rate accurately.
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-15429
>                 URL: https://issues.apache.org/jira/browse/SPARK-15429
>             Project: Spark
>          Issue Type: Bug
>          Components: Streaming
>    Affects Versions: 1.6.1
>            Reporter: Albert Cheng
>
> When `spark.streaming.concurrentJobs > 1`, PIDRateEstimator cannot estimate the receiving
rate accurately.
> For example, if the batch duration is set to 10 seconds, each rdd in the dstream will
take 20s to compute. By changing `spark.streaming.concurrentJobs=2`, each rdd in the dstream
still takes 20s to consume the data, which leads to poor estimation of backpressure by PIDRateEstimator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message