spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Khanderao kand <khanderao.k...@gmail.com>
Subject Re: General Spark question (streaming)
Date Fri, 10 Jan 2014 23:03:00 GMT
1."What do people generally do once they have the data in Spark to enable
real-time analytics. Do you store it in some persistent storage and analyze
it within some window (let's say the last five minutes) after enough has
been aggregated or...?"
>>>It is based on your application. If you have dash boarding / alerting
application then you would push the aggregated results to UI / message
queue. However, if you want these results to be available for later
queries, it would need to be persisted in some storage like HBase.

2. "If I want to count the number of occurrences of an event within a given
time frame within a streaming context - does Spark support this and how? "
  >>>Spark supporting windowing, as well as counter.


On Thu, Jan 9, 2014 at 11:07 AM, Ognen Duzlevski
<ognen@nengoiksvelzud.com>wrote:

> Hello,
>
> I am new to spark and have a few questions that are fairly general in
> nature:
>
> I am trying to set up a real-time data analysis pipeline where I have
> clients sending events to a collection point (load balanced) and onward the
> "collectors" send the data to a Spark cluster via zeromq pub/sub (just an
> experiment).
>
> What do people generally do once they have the data in Spark to enable
> real-time analytics. Do you store it in some persistent storage and analyze
> it within some window (let's say the last five minutes) after enough has
> been aggregated or...?
>
> If I want to count the number of occurrences of an event within a given
> time frame within a streaming context - does Spark support this and how?
> General guidelines are OK and any experiences, knowledge and advice is
> greatly appreciated!
>
> Thanks
> Ognen
>

Mime
View raw message