spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: RDD boundaries and triggering processing using tags in the data
Date Mon, 01 Jun 2015 07:36:22 GMT
May be you can make use of the Window operations
<https://spark.apache.org/docs/1.2.0/streaming-programming-guide.html#window-operations>,
Also another approach would be to keep your incoming data in
Hbase/Redis/Cassandra kind of database and then whenever you need to
average it, you just query the database and average it.

Thanks
Best Regards

On Thu, May 28, 2015 at 1:22 AM, David Webber <david.webber@gmail.com>
wrote:

> Hi All,
>
> I'm new to Spark and I'd like some help understanding if a particular use
> case would be a good fit for Spark Streaming.
>
> I have an imaginary stream of sensor data consisting of integers 1-10.
> Every time the sensor reads "10" I'd like to average all the numbers that
> were received since the last "10"
>
> example input: 10 5 8 4 6 2 1 2 8 8 8 1 6 9 1 3 10 1 3 10 ...
> desired output: 4.8, 2.0
>
> I'm confused about what happens if sensor readings fall into different
> RDDs.
>
> RDD1:  10 5 8 4 6 2 1 2 8 8 8
> RDD2:  1 6 9 1 3 10 1 3 10
> output: ???, 2.0
>
> My imaginary sensor doesn't read at fixed time intervals, so breaking the
> stream into RDDs by time interval won't ensure the data is packaged
> properly.  Additionally, multiple sensors are writing to the same stream
> (though I think flatMap can parse the origin stream into streams for
> individual sensors, correct?).
>
> My best guess for processing goes like
> 1) flatMap() to break out individual sensor streams
> 2) Custom parser to accumulate input data until "10" is found, then create
> a
> new output RDD for each sensor and data grouping
> 3) average the values from step 2
>
> I would greatly appreciate pointers to some specific documentation or
> examples if you have seen something like this before.
>
> Thanks,
> David
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-boundaries-and-triggering-processing-using-tags-in-the-data-tp23060.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message