spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Nguyen <...@adatao.com>
Subject Re: Mutable tagging RDD rows ?
Date Sat, 29 Mar 2014 02:02:32 GMT
Sung Hwan, strictly speaking, RDDs are immutable, so the canonical way to
get what you want is to transform to another RDD. But you might look at
MutablePair (
https://github.com/apache/spark/blob/60abc252545ec7a5d59957a32e764cd18f6c16b4/core/src/main/scala/org/apache/spark/util/MutablePair.scala)
to see if the semantics meet your needs.

Alternatively you can consider:

   1. Build & provide a fast lookup service that stores and returns the
   mutable information keyed by the RDD row IDs, or
   2. Use DDF (Distributed DataFrame) which we'll make available in the
   near future, which will give you fully mutable-row table semantics.


--
Christopher T. Nguyen
Co-founder & CEO, Adatao <http://adatao.com>
linkedin.com/in/ctnguyen



On Fri, Mar 28, 2014 at 5:16 PM, Sung Hwan Chung
<codedeft@cs.stanford.edu>wrote:

> Hey guys,
>
> I need to tag individual RDD lines with some values. This tag value would
> change at every iteration. Is this possible with RDD (I suppose this is
> sort of like mutable RDD, but it's more) ?
>
> If not, what would be the best way to do something like this? Basically,
> we need to keep mutable information per data row (this would be something
> much smaller than actual data row, however).
>
> Thanks
>

Mime
View raw message