spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Join DStream With Other Datasets
Date Sat, 17 Jan 2015 18:12:29 GMT
Can't you send a special event through spark streaming once the list is
updated? So you have your normal events and a special reload event
Le 17 janv. 2015 15:06, "Ji ZHANG" <zhangji87@gmail.com> a écrit :

> Hi,
>
> I want to join a DStream with some other dataset, e.g. join a click
> stream with a spam ip list. I can think of two possible solutions, one
> is use broadcast variable, and the other is use transform operation as
> is described in the manual.
>
> But the problem is the spam ip list will be updated outside of the
> spark streaming program, so how can it be noticed to reload the list?
>
> For broadcast variables, they are immutable.
>
> For transform operation, is it costly to reload the RDD on every
> batch? If it is, and I use RDD.persist(), does it mean I need to
> launch a thread to regularly unpersist it so that it can get the
> updates?
>
> Any ideas will be appreciated. Thanks.
>
> --
> Jerry
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message