spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ognen Duzlevski <>
Subject General Spark question (streaming)
Date Thu, 09 Jan 2014 19:07:21 GMT

I am new to spark and have a few questions that are fairly general in

I am trying to set up a real-time data analysis pipeline where I have
clients sending events to a collection point (load balanced) and onward the
"collectors" send the data to a Spark cluster via zeromq pub/sub (just an

What do people generally do once they have the data in Spark to enable
real-time analytics. Do you store it in some persistent storage and analyze
it within some window (let's say the last five minutes) after enough has
been aggregated or...?

If I want to count the number of occurrences of an event within a given
time frame within a streaming context - does Spark support this and how?
General guidelines are OK and any experiences, knowledge and advice is
greatly appreciated!


View raw message