spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Lust <jl...@mc10inc.com>
Subject Re: Pairwise Processing of a List
Date Mon, 26 Jan 2015 01:17:44 GMT
So you’ve got a point A and you want the sum of distances between it and all other points?
Or am I misunderstanding you?

// target point, can be Broadcast global sent to all workers
val tarPt = (10,20)
val pts = Seq((2,2),(3,3),(2,3),(10,2))
val rdd= sc.parallelize(pts)
rdd.map( pt => Math.sqrt( Math.pow(tarPt._1 - pt._1,2) + Math.pow(tarPt._2 - pt._2,2))
).reduce( (d1,d2) => d1+d2)

-Joe

From: Steve Nunez <snunez@hortonworks.com<mailto:snunez@hortonworks.com>>
Date: Sunday, January 25, 2015 at 7:32 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Pairwise Processing of a List

Spark Experts,

I’ve got a list of points: List[(Float, Float)]) that represent (x,y) coordinate pairs and
need to sum the distance. It’s easy enough to compute the distance:

case class Point(x: Float, y: Float) {
  def distance(other: Point): Float =
    sqrt(pow(x - other.x, 2) + pow(y - other.y, 2)).toFloat
}

(in this case I create a ‘Point’ class, but the maths are the same).

What I can’t figure out is the ‘right’ way to sum distances between all the points.
I can make this work by traversing the list with a for loop and using indices, but this doesn’t
seem right.

Anyone know a clever way to process List[(Float, Float)]) in a pairwise fashion?

Regards,
- Steve



Mime
View raw message