spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <m...@clearstorydata.com>
Subject Re: RDD usage
Date Tue, 25 Mar 2014 02:23:10 GMT
No, it won't.  The type of RDD#foreach is Unit, so it doesn't return an
RDD.  The utility of foreach is purely for the side effects it generates,
not for its return value -- and modifying an RDD in place via foreach is
generally not a very good idea.


On Mon, Mar 24, 2014 at 6:35 PM, hequn cheng <chenghequn@gmail.com> wrote:

> points.foreach(p=>p.y = another_value) will return a new modified RDD.
>
>
> 2014-03-24 18:13 GMT+08:00 Chieh-Yen <r01944006@csie.ntu.edu.tw>:
>
>  Dear all,
>>
>> I have a question about the usage of RDD.
>> I implemented a class called AppDataPoint, it looks like:
>>
>> case class AppDataPoint(input_y : Double, input_x : Array[Double])
>> extends Serializable {
>>   var y : Double = input_y
>>   var x : Array[Double] = input_x
>>   ......
>> }
>> Furthermore, I created the RDD by the following function.
>>
>> def parsePoint(line: String): AppDataPoint = {
>>   /* Some related works for parsing */
>>   ......
>> }
>>
>> Assume the RDD called "points":
>>
>> val lines = sc.textFile(inputPath, numPartition)
>> var points = lines.map(parsePoint _).cache()
>>
>> The question is that, I tried to modify the value of this RDD, the
>> operation is:
>>
>> points.foreach(p=>p.y = another_value)
>>
>> The operation is workable.
>> There doesn't have any warning or error message showed by the system and
>> the results are right.
>> I wonder that if the modification for RDD is a correct and in fact
>> workable design.
>> The usage web said that the RDD is immutable, is there any suggestion?
>>
>> Thanks a lot.
>>
>> Chieh-Yen Lin
>>
>
>

Mime
View raw message