[ https://issues.apache.org/jira/browse/SPARK-15429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292908#comment-15292908
]
Albert Cheng edited comment on SPARK-15429 at 5/21/16 2:58 AM:
---------------------------------------------------------------
I have a idea about this issue.
First, add a new parameter `concurrentJobs` to PIDRateEstimator.
Second, We can change the `error = latestRate - processingRate` to `error = latestRate - processingRate
* concurrentJobs.toDouble`. And change the `historicalError = schedulingDelay.toDouble * processingRate
/ batchIntervalMillis` to `historicalError = schedulingDelay.toDouble * processingRate * concurrentJobs.toDouble
/ batchIntervalMillis`.
Is it right?
I would like to fix this.
[~apachespark]
was (Author: cq365423762):
I have a idea about this issue.
First, add a new parameter `concurrentJobs` to PIDRateEstimator.
Second, We can change the `error = latestRate - processingRate` to `error = latestRate - processingRate
* concurrentJobs.toDouble`. And change the `historicalError = schedulingDelay.toDouble * processingRate
/ batchIntervalMillis` to `historicalError = schedulingDelay.toDouble * processingRate * concurrentJobs.toDouble
/ batchIntervalMillis`.
Is it right?
I would like to fix this.
> When `spark.streaming.concurrentJobs > 1`, PIDRateEstimator cannot estimate the receiving
rate accurately.
> ----------------------------------------------------------------------------------------------------------
>
> Key: SPARK-15429
> URL: https://issues.apache.org/jira/browse/SPARK-15429
> Project: Spark
> Issue Type: Bug
> Components: Streaming
> Affects Versions: 1.6.1
> Reporter: Albert Cheng
>
> When `spark.streaming.concurrentJobs > 1`, PIDRateEstimator cannot estimate the receiving
rate accurately.
> For example, if the batch duration is set to 10 seconds, each rdd in the dstream will
take 20s to compute. By changing `spark.streaming.concurrentJobs=2`, each rdd in the dstream
still takes 20s to consume the data, which leads to poor estimation of backpressure by PIDRateEstimator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org
|