i currently typically do something like this:

scala> val rdd = sc.parallelize(1 to 10)
scala> import com.twitter.algebird.Operators._
scala> import com.twitter.algebird.{Max, Min}
scala> rdd.map{ x => (
     |   1L,
     |   Min(x),
     |   Max(x),
     |   x
     | )}.reduce(_ + _)
res0: (Long, com.twitter.algebird.Min[Int], com.twitter.algebird.Max[Int], Int) = (10,Min(1),Max(10),55)

however for this you need twitter algebird dependency. without that you have to code the reduce function on the tuples yourself...

another example with 2 columns, where i do conditional count for first column, and simple sum for second:
scala> sc.parallelize((1 to 10).zip(11 to 20)).map{ case (x, y) => (
     |   if (x > 5) 1 else 0,
     |   y
     | )}.reduce(_ + _)
res3: (Int, Int) = (5,155)



On Sun, Mar 23, 2014 at 2:26 PM, Richard Siebeling <rsiebeling@gmail.com> wrote:
Hi Koert, Patrick,

do you already have an elegant solution to combine multiple operations on a single RDD?
Say for example that I want to do a sum over one column, a count and an average over another column,

thanks in advance,
Richard


On Mon, Mar 17, 2014 at 8:20 AM, Richard Siebeling <rsiebeling@gmail.com> wrote:
Patrick, Koert,

I'm also very interested in these examples, could you please post them if you find them?
thanks in advance,
Richard


On Thu, Mar 13, 2014 at 9:39 PM, Koert Kuipers <koert@tresata.com> wrote:
not that long ago there was a nice example on here about how to combine multiple operations on a single RDD. so basically if you want to do a count() and something else, how to roll them into a single job. i think patrick wendell gave the examples.

i cant find them anymore.... patrick can you please repost? thanks!