spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: Spark Streaming and reducing latency
Date Sun, 17 May 2015 15:04:07 GMT
With receiver based streaming, you can actually
specify spark.streaming.blockInterval which is the interval at which the
receiver will fetch data from the source. Default value is 200ms and hence
if your batch duration is 1 second, it will produce 5 blocks of data. And
yes, with sparkstreaming when your processing time goes beyond your batch
duration and you are having a higher data consumption then you will
overwhelm the receiver's memory and hence will throw up block not found
exceptions.

Thanks
Best Regards

On Sun, May 17, 2015 at 7:21 PM, dgoldenberg <dgoldenberg123@gmail.com>
wrote:

> I keep hearing the argument that the way Discretized Streams work with
> Spark
> Streaming is a lot more of a batch processing algorithm than true
> streaming.
> For streaming, one would expect a new item, e.g. in a Kafka topic, to be
> available to the streaming consumer immediately.
>
> With the discretized streams, streaming is done with batch intervals i.e.
> the consumer has to wait the interval to be able to get at the new items.
> If
> one wants to reduce latency it seems the only way to do this would be by
> reducing the batch interval window. However, that may lead to a great deal
> of churn, with many requests going into Kafka out of the consumers,
> potentially with no results whatsoever as there's nothing new in the topic
> at the moment.
>
> Is there a counter-argument to this reasoning? What are some of the general
> approaches to reduce latency  folks might recommend? Or, perhaps there are
> ways of dealing with this at the streaming API level?
>
> If latency is of great concern, is it better to look into streaming from
> something like Flume where data is pushed to consumers rather than pulled
> by
> them? Are there techniques, in that case, to ensure the consumers don't get
> overwhelmed with new data?
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-reducing-latency-tp22922.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message