spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Pritchard <nicholas.pritch...@falkonry.com>
Subject Creating time-sequential pairs
Date Thu, 08 May 2014 22:04:42 GMT
Hi Spark community,

I have a design/algorithm question that I assume is common enough for
someone else to have tackled before. I have an RDD of time-series data
formatted as time-value tuples, RDD[(Double, Double)], and am trying to
extract threshold crossings. In order to do so, I first want to transform
the RDD into pairs of time-sequential values.

For example:
Input: The time-series data:
(1, 0.05)
(2, 0.10)
(3, 0.15)
Output: Transformed into time-sequential pairs:
((1, 0.05), (2, 0.10))
((2, 0.10), (3, 0.15))

My initial thought was to try and utilize a custom partitioner. This
partitioner could ensure sequential data was kept together. Then I could
use "mapPartitions" to transform these lists of sequential data. Finally, I
would need some logic for creating sequential pairs across the boundaries
of each partition.

However I was hoping to get some feedback and ideas from the community.
Anyone have thoughts on a simpler solution?

Thanks,
Nick

Mime
View raw message