spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sidd S <>
Subject Combine code for RDD and DStream
Date Mon, 03 Aug 2015 17:42:25 GMT

I am developing a Spark program that uses both batch and streaming
(separately). They are both pretty much the exact same programs, except the
inputs come from different sources. Unfortunately, RDD's and DStream's
define all of their transformations in their own files, and so I have two
different files with pretty much the exact same code. If I make a change to
a transformation in one program, I have to make the exact same change to
the other program. It would be nice to be able to have a third file that
has all of my transformations. The batch program and the streaming program
can then both reference this third file to know what transformations to
perform on the data.

Anyone know a good way of doing this? I want to be able to keep the exact
same syntax (......rdd.filter({i:Int=>i*2}.map(.......).....) in this third
file. With this method, if I make any changes to the transformations, it
will apply to both the batch AND streaming processes. I tried a couple of
ideas with no avail.

Thanks in advance,

View raw message