spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: How big the spark stream window could be ?
Date Mon, 09 May 2016 07:26:57 GMT
That is a valid point Shao. However, it will start using disk space as
memory storage akin to swap space. It will not crash I believe it will just
be slow and this assumes that you do not run out of disk space.

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 9 May 2016 at 08:14, Saisai Shao <sai.sai.shao@gmail.com> wrote:

> For window related operators, Spark Streaming will cache the data into
> memory within this window, in your case your window size is up to 24 hours,
> which means data has to be in Executor's memory for more than 1 day, this
> may introduce several problems when memory is not enough.
>
> On Mon, May 9, 2016 at 3:01 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> > wrote:
>
>> ok terms for Spark Streaming
>>
>> "Batch interval" is the basic interval at which the system with receive
>> the data in batches.
>> This is the interval set when creating a StreamingContext. For example,
>> if you set the batch interval as 300 seconds, then any input DStream will
>> generate RDDs of received data at 300 seconds intervals.
>> A window operator is defined by two parameters -
>> - WindowDuration / WindowsLength - the length of the window
>> - SlideDuration / SlidingInterval - the interval at which the window will
>> slide or move forward
>>
>>
>> Ok so your batch interval is 5 minutes. That is the rate messages are
>> coming in from the source.
>>
>> Then you have these two params
>>
>> // window length - The duration of the window below that must be multiple
>> of batch interval n in = > StreamingContext(sparkConf, Seconds(n))
>> val windowLength = x =  m * n
>> // sliding interval - The interval at which the window operation is
>> performed in other words data is collected within this "previous interval'
>> val slidingInterval =  y l x/y = even number
>>
>> Both the window length and the slidingInterval duration must be multiples
>> of the batch interval, as received data is divided into batches of duration
>> "batch interval".
>>
>> If you want to collect 1 hour data then windowLength =  12 * 5 * 60
>> seconds
>> If you want to collect 24 hour data then windowLength =  24 * 12 * 5 * 60
>>
>> You sliding window should be set to batch interval = 5 * 60 seconds. In
>> other words that where the aggregates and summaries come for your report.
>>
>> What is your data source here?
>>
>> HTH
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 9 May 2016 at 04:19, kramer2009@126.com <kramer2009@126.com> wrote:
>>
>>> We have some stream data need to be calculated and considering use spark
>>> stream to do it.
>>>
>>> We need to generate three kinds of reports. The reports are based on
>>>
>>> 1. The last 5 minutes data
>>> 2. The last 1 hour data
>>> 3. The last 24 hour data
>>>
>>> The frequency of reports is 5 minutes.
>>>
>>> After reading the docs, the most obvious way to solve this seems to set
>>> up a
>>> spark stream with 5 minutes interval and two window which are 1 hour and
>>> 1
>>> day.
>>>
>>>
>>> But I am worrying that if the window is too big for one day and one
>>> hour. I
>>> do not have much experience on spark stream, so what is the window
>>> length in
>>> your environment?
>>>
>>> Any official docs talking about this?
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-big-the-spark-stream-window-could-be-tp26899.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>

Mime
View raw message