spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Pairwise Processing of a List
Date Mon, 26 Jan 2015 01:28:00 GMT
(PS the Scala code I posted is a poor way to do it -- it would
materialize the entire cartesian product in memory. You can use
.iterator or .view to fix that.)

Ah, so you want sum of distances between successive points.

val points: List[(Double,Double)] = ...
points.sliding(2).map { case List(p1,p2) => distance(p1,p2) }.sum

If you import org.apache.spark.mllib.rdd.RDDFunctions._ you should
have access to something similar in Spark over an RDD. It gives you a
sliding() function that produces Arrays of sequential elements.

Note that RDDs don't really guarantee anything about ordering though,
so this only makes sense if you've already sorted some upstream RDD by
a timestamp or sequence number.

On Mon, Jan 26, 2015 at 1:21 AM, Steve Nunez <snunez@hortonworks.com> wrote:
> Not combinations, linear distances, e.g., given: List[ (x1,y1), (x2,y2),
> (x3,y3) ], compute the sum of:
>
> distance (x1,y2) and (x2,y2) and
> distance (x2,y2) and (x3,y3)
>
> Imagine that the list of coordinate point comes from a GPS and describes a
> trip.
>
> - Steve
>
> From: Joseph Lust <jlust@mc10inc.com>
> Date: Sunday, January 25, 2015 at 17:17
> To: Steve Nunez <snunez@hortonworks.com>, "user@spark.apache.org"
> <user@spark.apache.org>
> Subject: Re: Pairwise Processing of a List
>
> So you’ve got a point A and you want the sum of distances between it and all
> other points? Or am I misunderstanding you?
>
> // target point, can be Broadcast global sent to all workers
> val tarPt = (10,20)
> val pts = Seq((2,2),(3,3),(2,3),(10,2))
> val rdd= sc.parallelize(pts)
> rdd.map( pt => Math.sqrt( Math.pow(tarPt._1 - pt._1,2) + Math.pow(tarPt._2 -
> pt._2,2)) ).reduce( (d1,d2) => d1+d2)
>
> -Joe
>
> From: Steve Nunez <snunez@hortonworks.com>
> Date: Sunday, January 25, 2015 at 7:32 PM
> To: "user@spark.apache.org" <user@spark.apache.org>
> Subject: Pairwise Processing of a List
>
> Spark Experts,
>
> I’ve got a list of points: List[(Float, Float)]) that represent (x,y)
> coordinate pairs and need to sum the distance. It’s easy enough to compute
> the distance:
>
> case class Point(x: Float, y: Float) {
>   def distance(other: Point): Float =
>     sqrt(pow(x - other.x, 2) + pow(y - other.y, 2)).toFloat
> }
>
> (in this case I create a ‘Point’ class, but the maths are the same).
>
> What I can’t figure out is the ‘right’ way to sum distances between all the
> points. I can make this work by traversing the list with a for loop and
> using indices, but this doesn’t seem right.
>
> Anyone know a clever way to process List[(Float, Float)]) in a pairwise
> fashion?
>
> Regards,
> - Steve
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message