i currently typically do something like this:

scala&g= t; val rdd =3D sc.parallelize(1 to 10)
scala> rdd.map{ x = =3D> (
=A0=A0=A0=A0 |=A0=A0 1L,
=A0=A0=A0=A0 |=A0=A0 Min(x),
= =A0=A0=A0=A0 |=A0=A0 Max(x),
=A0=A0=A0=A0 |=A0=A0 x
=A0=A0=A0=A0 | )}= .reduce(_ + _)

however for this you need twitter algebird dependency. without that= you have to code the reduce function on the tuples yourself...

an= other example with 2 columns, where i do conditional count for first column= , and simple sum for second:
scala> sc.parallelize((1 to 10).zip(11 t= o 20)).map{ case (x, y) =3D> (
=A0=A0=A0=A0 |=A0=A0 if (x > 5) 1 else 0,
=A0=A0=A0=A0 |=A0=A0 y
= =A0=A0=A0=A0 | )}.reduce(_ + _)
res3: (Int, Int) =3D (5,155)

On Su= n, Mar 23, 2014 at 2:26 PM, Richard Siebeling wrote:
Hi Koert, Patric= k,

do you already have an elegant solution to combine multiple operations on a= single RDD?
Say for example that I want to do a sum over o= ne column, a count and an average over another column,

Richard

On Mon, Mar 17, 2014 at 8:20 = AM, Richard Siebeling wrote:
Patrick, Koert,<= /div>

I'm also very interested in these examples, could you please post them = if you find them?
Richard

On Thu, Mar 13, 2014 at 9:39 PM, Koert Kuipers wrote:
not that long ago there was a= nice example on here about how to combine multiple operations on a single = RDD. so basically if you want to do a count() and something else, how to ro= ll them into a single job. i think patrick wendell gave the examples.

i cant find them anymore.... patrick can you please repost? thanks!
=

--001a11c364ba77632704f54bff68--