spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Balakrishnan Narendran <balu.na...@gmail.com>
Subject Re: spark streaming with checkpoint
Date Thu, 22 Jan 2015 16:19:20 GMT
Thank you Jerry,
       Does the window operation create new RDDs for each slide duration..?
I am asking this because i see a constant increase in memory even when
there is no logs received.
If not checkpoint is there any alternative that you would suggest.?


On Tue, Jan 20, 2015 at 7:08 PM, Shao, Saisai <saisai.shao@intel.com> wrote:

>  Hi,
>
>
>
> Seems you have such a large window (24 hours), so the phenomena of memory
> increasing is expectable, because of window operation will cache the RDD
> within this window in memory. So for your requirement, memory should be
> enough to hold the data of 24 hours.
>
>
>
> I don’t think checkpoint in Spark Streaming can alleviate such problem,
> because checkpoint are mainly for fault tolerance.
>
>
>
> Thanks
>
> Jerry
>
>
>
> *From:* balu.naren [mailto:balu.naren@gmail.com]
> *Sent:* Tuesday, January 20, 2015 7:17 PM
> *To:* user@spark.apache.org
> *Subject:* spark streaming with checkpoint
>
>
>
> I am a beginner to spark streaming. So have a basic doubt regarding
> checkpoints. My use case is to calculate the no of unique users by day. I
> am using reduce by key and window for this. Where my window duration is 24
> hours and slide duration is 5 mins. I am updating the processed record to
> mongodb. Currently I am replace the existing record each time. But I see
> the memory is slowly increasing over time and kills the process after 1 and
> 1/2 hours(in aws small instance). The DB write after the restart clears all
> the old data. So I understand checkpoint is the solution for this. But my
> doubt is
>
>    - What should my check point duration be..? As per documentation it
>    says 5-10 times of slide duration. But I need the data of entire day. So it
>    is ok to keep 24 hrs.
>    - Where ideally should the checkpoint be..? Initially when I receive
>    the stream or just before the window operation or after the data reduction
>    has taken place.
>
>
> Appreciate your help.
> Thank you
>  ------------------------------
>
> View this message in context: spark streaming with checkpoint
> <http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-with-checkpoint-tp21263.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>

Mime
View raw message