spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Jeffrey <bryan.jeff...@gmail.com>
Subject Getting Started with Spark
Date Tue, 08 Sep 2015 13:46:40 GMT
Hello. We're getting started with Spark Streaming. We're working to build
some unit/acceptance testing around functions that consume DStreams. The
current method for creating DStreams is to populate the data by creating an
InputDStream:

val input = Array(TestDataFactory.CreateEvent(123 notFoundData))
val queue =
scala.collection.mutable.Queue(ssc.sparkContext.parallelize(input))
val events: InputDStream[MyEvent] = ssc.queueStream(queue)

The 'events' InputDStream can then be fed into functions. However, the
stream does not allow checkpointing. This means that we're unable to use
this to feed methods/classes that execute stateful actions like
'updateStateByKey'.

Does anyone have a simple, contained method to create DStreams that allow
for checkpointing? I looked at the Spark unit test framework, but that
seems to require access to a bunch of spark internals (requiring that
you're within the spark package, etc.)

Mime
View raw message