spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lmk <lakshmi.muralikrish...@gmail.com>
Subject Can this be done in map-reduce technique (in parallel)
Date Wed, 04 Jun 2014 10:49:12 GMT
Hi,
I am a new spark user. Pls let me know how to handle the following scenario:

I have a data set with the following fields:
1. DeviceId
2. latitude
3. longitude
4. ip address
5. Datetime
6. Mobile application name

With the above data, I would like to perform the following steps:
1. Collect all lat and lon for each ipaddress 
        (ip1,(lat1,lon1),(lat2,lon2))
        (ip2,(lat3,lon3),(lat4,lat5))
2. For each IP, 
        1.Find the distance between each lat and lon coordinate pair and all
the other pairs under the same IP 
        2.Select those coordinates whose distances fall under a specific
threshold (say 100m) 
        3.Find the coordinate pair with the maximum occurrences 

In this case, how can I iterate and compare each coordinate pair with all
the other pairs? 
Can this be done in a distributed manner, as this data set is going to have
a few million records? 
Can we do this in map/reduce commands?

Thanks.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-this-be-done-in-map-reduce-technique-in-parallel-tp6905.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Mime
View raw message