spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lmk <>
Subject Re: Can this be done in map-reduce technique (in parallel)
Date Thu, 05 Jun 2014 05:11:00 GMT
Hi Oleg/Andrew,
Thanks much for the prompt response.

We expect thousands of lat/lon pairs for each IP address. And that is my
concern with the Cartesian product approach. 
Currently for a small sample of this data (5000 rows) I am grouping by IP
address and then computing the distance between lat/lon coordinates using
array manipulation techniques. 
But I understand this approach is not right when the data volume goes up.
My code is as follows:

val dataset:RDD[String] = sc.textFile("x.csv")
val data =>l.split(","))
val grpData = =>

Now, I have the data grouped by ipaddress as Array[(String,
Iterable[(Double, Double)])]
 Array((ip1,ArrayBuffer((lat1,lon1), (lat2,lon2), (lat3,lon3)))

Now I have to find the distance between (lat1,lon1) and (lat2,lon2) and then
between (lat1,lon1) and (lat3,lon3) and so on for all combinations.

This is where I get stuck. Please guide me on this.

Thanks Again.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message