spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: Back-pressure for Spark Streaming
Date Fri, 08 May 2015 17:35:53 GMT
We had a similar issue while working on one of our usecase where we were
processing at a moderate throughput (around 500MB/S). When the processing
time exceeds the batch duration, it started to throw up blocknotfound
exceptions, i made a workaround for that issue and is explained over here
http://apache-spark-developers-list.1001551.n3.nabble.com/SparkStreaming-Workaround-for-BlockNotFound-Exceptions-td12096.html

Basically, instead of generating blocks blindly, i made the receiver sleep
if there's an increase in the scheduling delay (if scheduling delay exceeds
3 times the batch duration). This prototype is working nicely and the speed
is encouraging as its processing at 500MB/S without having any failures so
far.


Thanks
Best Regards

On Fri, May 8, 2015 at 8:11 PM, Fran├žois Garillot <
francois.garillot@typesafe.com> wrote:

> Hi guys,
>
> We[1] are doing a bit of work on Spark Streaming, to help it face
> situations where the throughput of data on an InputStream is (momentarily)
> susceptible to overwhelm the Receiver(s) memory.
>
> The JIRA & design doc is here:
> https://issues.apache.org/jira/browse/SPARK-7398
>
> We'd sure appreciate your comments !
>
> --
> Fran├žois Garillot
> [1]: Typesafe & some helpful collaborators on benchmarking 'at scale'
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message