spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Rodríguez Hortalá <juan.rodriguez.hort...@gmail.com>
Subject Implementing a spark version of Haskell's partition
Date Wed, 17 Dec 2014 16:56:42 GMT
Hi all,

I would like to be able to split a RDD in two pieces according to a
predicate. That would be equivalent to applying filter twice, with the
predicate and its complement, which is also similar to Haskell's partition
list function (
http://hackage.haskell.org/package/base-4.7.0.1/docs/Data-List.html). There
is currently any way to do this in Spark?, or maybe anyone has a suggestion
about how to implent this by modifying the Spark source. I think this is
valuable because sometimes I need to split a RDD in several groups that are
too big to fit in the memory of a single thread, so pair RDDs are not
solution for those cases. A generalization to n parts of Haskell's
partition would do the job.

Thanks a lot for your help.

Greetings,

Juan Rodriguez

Mime
View raw message