spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Asim Jalis <asimja...@gmail.com>
Subject Re: RDD Moving Average
Date Tue, 06 Jan 2015 20:34:39 GMT
​Thanks. Another question. ​I have event data with timestamps. I want to
create a sliding window using timestamps. Some windows will have a lot of
events in them others won’t. Is there a way to get an RDD made of this kind
of a variable length window?


On Tue, Jan 6, 2015 at 1:03 PM, Sean Owen <sowen@cloudera.com> wrote:

> First you'd need to sort the RDD to give it a meaningful order, but I
> assume you have some kind of timestamp in your data you can sort on.
>
> I think you might be after the sliding() function, a developer API in
> MLlib:
>
>
> https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala#L43
>
> On Tue, Jan 6, 2015 at 5:25 PM, Asim Jalis <asimjalis@gmail.com> wrote:
>
>> Is there an easy way to do a moving average across a single RDD (in a
>> non-streaming app). Here is the use case. I have an RDD made up of stock
>> prices. I want to calculate a moving average using a window size of N.
>>
>> Thanks.
>>
>> Asim
>>
>
>

Mime
View raw message