spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <t...@databricks.com>
Subject Re: Breaking lineage and reducing stages in Spark Streaming
Date Thu, 09 Jul 2015 10:49:46 GMT
If you are continuously unioning RDDs, then you are accumulating ever
increasing data, and you are processing ever increasing amount of data in
every batch. Obviously this is going to not last for very long. You
fundamentally cannot keep processing ever increasing amount of data with
finite resources, isnt it?

On Thu, Jul 9, 2015 at 3:17 AM, Anand Nalya <anand.nalya@gmail.com> wrote:

> Thats from the Streaming tab for Spark 1.4 WebUI.
>
> On 9 July 2015 at 15:35, Michel Hubert <michelh@vsnsystemen.nl> wrote:
>
>>  Hi,
>>
>>
>>
>> I was just wondering how you generated to second image with the charts.
>>
>> What product?
>>
>>
>>
>> *From:* Anand Nalya [mailto:anand.nalya@gmail.com]
>> *Sent:* donderdag 9 juli 2015 11:48
>> *To:* spark users
>> *Subject:* Breaking lineage and reducing stages in Spark Streaming
>>
>>
>>
>> Hi,
>>
>>
>>
>> I've an application in which an rdd is being updated with tuples coming
>> from RDDs in a DStream with following pattern.
>>
>>
>>
>> dstream.foreachRDD(rdd => {
>>
>>   myRDD = myRDD.union(rdd.filter(myfilter)).reduceByKey(_+_)
>>
>> })
>>
>>
>>
>> I'm using cache() and checkpointin to cache results. Over the time, the
>> lineage of myRDD keeps increasing and stages in each batch of dstream keeps
>> increasing, even though all the earlier stages are skipped. When the number
>> of stages grow big enough, the overall delay due to scheduling delay starts
>> increasing. The processing time for each batch is still fixed.
>>
>>
>>
>> Following figures illustrate the problem:
>>
>>
>>
>> Job execution: https://i.imgur.com/GVHeXH3.png?1
>>
>> [image: Image removed by sender.]
>>
>> Delays: https://i.imgur.com/1DZHydw.png?1
>>
>> [image: Image removed by sender.]
>>
>> Is there some pattern that I can use to avoid this?
>>
>>
>>
>> Regards,
>>
>> Anand
>>
>
>

Mime
View raw message