I am new to spark and have a few questions that are fairly general in nature:

I am trying to set up a real-time data analysis pipeline where I have clients sending events to a collection point (load balanced) and onward the "collectors" send the data to a Spark cluster via zeromq pub/sub (just an experiment).

What do people generally do once they have the data in Spark to enable real-time analytics. Do you store it in some persistent storage and analyze it within some window (let's say the last five minutes) after enough has been aggregated or...?

If I want to count the number of occurrences of an event within a given time frame within a streaming context - does Spark support this and how? General guidelines are OK and any experiences, knowledge and advice is greatly appreciated!