spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <>
Subject Re: Join DStream With Other Datasets
Date Sat, 17 Jan 2015 18:12:29 GMT
Can't you send a special event through spark streaming once the list is
updated? So you have your normal events and a special reload event
Le 17 janv. 2015 15:06, "Ji ZHANG" <> a écrit :

> Hi,
> I want to join a DStream with some other dataset, e.g. join a click
> stream with a spam ip list. I can think of two possible solutions, one
> is use broadcast variable, and the other is use transform operation as
> is described in the manual.
> But the problem is the spam ip list will be updated outside of the
> spark streaming program, so how can it be noticed to reload the list?
> For broadcast variables, they are immutable.
> For transform operation, is it costly to reload the RDD on every
> batch? If it is, and I use RDD.persist(), does it mean I need to
> launch a thread to regularly unpersist it so that it can get the
> updates?
> Any ideas will be appreciated. Thanks.
> --
> Jerry
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message