spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: What is the minimum value allowed for StreamingContext's Seconds parameter?
Date Mon, 23 May 2016 15:57:36 GMT
depends on what you are using it for. Three parameters are important:


   1. Batch interval
   2. WindowsDuration
   3. SlideDuration

Batch interval is the basic interval at which the system with receive the
data in batches. This is the interval set when creating a StreamingContext.
For example, if you set the batch interval as 2 second, then any input
DStream will generate RDDs of received data at 2 second intervals.
A window operator is defined by two parameters -
WindowDuration / WindowsLength - the length of the window
SlideDuration / SlidingInterval - the interval at which the window will
slide or move forward

Generally speaking, the larger the batch window, the better the overall
performance, but the streaming data output will be updated less
frequently.....you will likely run into problems setting your batch window *<
0.5 sec,* and/or when the batch window < the amount of time it takes to run
the task....
Beyond that, the window length and sliding interval need to be multiples of
the batch window, but will depend entirely on your reporting requirements.

Consider
batch window = 10 secs
window length = 300 seconds
sliding interval = 60 seconds

In this scenario, you will be creating an output every 60 seconds,
aggregating data that you were collecting every 10 seconds from the source
over a previous 300 seconds

If you were trying to create continuously streaming output as fast as
possible (for example for complex event processing, see below), then you
would probably (almost always) be setting your sliding interval = batch
window and then shrinking the batch window as short as possible.

Example

val sparkConf = new SparkConf().
             setAppName("CEP_streaming").
             set("spark.driver.allowMultipleContexts", "true").
             set("spark.hadoop.validateOutputSpecs", "false")
*val ssc = new StreamingContext(sparkConf, Seconds(2))*




*// window length - The duration of the window below that must be multiple
of batch interval n in = > StreamingContext(sparkConf, Seconds(n))val
windowLength = 4// sliding interval - The interval at which the window
operation is performed in other words data is collected within this
"previous interval'val slidingInterval = 2  //* keep this the same as batch
window for continuous streaming. You are aggregating data that you are
collecting over the  batch Window

HTH


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 23 May 2016 at 16:32, nsalian <nsalian@cloudera.com> wrote:

> Thanks for the question.
> What kind of data rate are you expecting to receive?
>
>
>
>
> -----
> Neelesh S. Salian
> Cloudera
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/What-is-the-minimum-value-allowed-for-StreamingContext-s-Seconds-parameter-tp27007p27008.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message