spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: Is it possible to just change the value of the items in RDD without making a full copy?
Date Tue, 02 Dec 2014 11:47:35 GMT
RDDs are immutable, so if you want to change the value of an RDD then you
have to create another RDD from it by applying some transformation.

Not sure if this is what you are looking for:

val rdd = sc.parallelize(Range(0,100))
val rdd2 = rdd.map(x => {
                       println("Value : " + x)
                       var ret = 1
                       if(x != 0) ret = x
                       ret
                    })
rdd2.collect()


Thanks
Best Regards

On Tue, Dec 2, 2014 at 4:48 PM, Xuelin Cao <xuelincao@yahoo.com.invalid>
wrote:

>
> Hi,
>
>      I'd like to make an operation on an RDD that *ONLY *change the value
> of  some items, without make a full copy or full scan of each data.
>
>      It is useful when I need to handle a large RDD, and each time I need
> only to change a little fraction of the data, and keeps other data
> unchanged. Certainly I don't want to make a full copy the data to the new
> RDD.
>
>      For example, suppose I have a RDD that contains integer data from 0
> to 100. What I want is to make the first element of the RDD changed from 0
> to 1, other elements untouched.
>
>      I tried this, but it doesn't work:
>
>      var rdd = parallelize(Range(0,100))
>      rdd.mapPartitions({iter=> iter(0) = 1})
>
>      The reported error is :   value update is not a member of
> Iterator[Int]
>
>
>      Anyone knows how to make it work?
>
>

Mime
View raw message