spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nitinkak001 <>
Subject Window comparison matching using the sliding window functionality: feasibility
Date Mon, 29 Sep 2014 18:24:47 GMT
Need to know the feasibility of the below task. I am thinking of this one to
be a mapreduce-spark effort.

I need to run distributed sliding Window Comparison for digital data
matching on top of Hadoop. The data(Hive Table) will be partitioned,
distributed across data node. Then the window comparison tool, multiple
instance of it, would run on the individual partitions(locally to the data

This window comparison tool will be a sliding window in which all the rows
in a window interval will be compared based on different columns to each
other and a score will be generated. 

I am more familiar with map-reduce and I think uptill the partitioning part
we can do in it. For the distributed window comparison I am thinking of
using spark. I know spark streaming has a sliding window functionality. Can
I use that to accomplish above task?

Any suggestions are appreciated.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message