spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manish Amde <>
Subject reduce, transform, combine
Date Sun, 04 May 2014 08:12:34 GMT
I am currently using the RDD aggregate operation to reduce (fold) per
partition and then combine using the RDD aggregate operation.
def aggregate[U: ClassTag](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U)
=> U): U

I need to perform a transform operation after the seqOp and before the
combOp. The signature would look like
def foldTransformCombine[U: ClassTag](zeroReduceValue: V, zeroCombineValue:
U)(seqOp: (V, T) => V, transformOp: (V) => U, combOp: (U, U) => U): U

This is especially useful in the scenario where the transformOp is
expensive and should be performed once per partition before combining. Is
there a way to accomplish this with existing RDD operations? If yes, great
but if not, should we consider adding such a general transformation to the
list of RDD operations?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message