spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rekt...@voodoowarez.com
Subject Re: Create DStream consisting of HDFS and (then) Kafka data
Date Thu, 08 Jan 2015 07:01:53 GMT
On Thu, Jan 08, 2015 at 02:33:30PM +0900, Tobias Pfeiffer wrote:
> Hi,
> 
> On Thu, Jan 8, 2015 at 2:19 PM, <rektide@voodoowarez.com> wrote:
> 
> > dstream processing bulk HDFS data- is something I don't feel is super
> 
> well socialized yet, & fingers crossed that base gets built up a little
> > more.
> 
> 
> Just out of interest (and hoping not to hijack my own thread), why are you
> not doing plain RDD processing when you are only processing HDFS data?
> What's the advantage of doing DStream?
> 
> Thanks
> Tobias

Like you- in the old Storm use case, we were doing a lot of windowing functions, &c.

We want a consistent discretization process for all our intake data, whether
it's realtime or not, and we want to use the same discretized stream tech,
whether we're discretizing here now or whether it's historical data.

Only then is Lambda-beast anywhere near slain.  To the single-system. o7
-rektide

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message