spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <>
Subject Streaming data + Blocked Model
Date Thu, 28 May 2015 15:13:25 GMT

We want to keep the model created and loaded in memory through Spark batch
context since blocked matrix operations are required to optimize on runtime.

The data is streamed in through Kafka / raw sockets and Spark Streaming
Context. We want to run some prediction operations with the streaming data
and model loaded in memory through batch context.

Do I need to open up a API on top of the batch context or it is possible to
use a RDD created by batch context through streaming context ?

Most likely not since both streaming context and batch context can't exist
in the same spark job but I am curious.

If I have to open up an API, does it makes sense to come up with a generic
serving api for mllib and let all mllib algorithms expose a serving API ?
The API can be spawned using Spark's actor system itself specially since
spray is merging to akka-httpx and akka is a dependency in spark already.

May be it's not a good idea since it needs maintaining another actor system
for the API.


View raw message