spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Punit Naik <naik.puni...@gmail.com>
Subject Re: Modify the functioning of zipWithIndex function for RDDs
Date Tue, 28 Jun 2016 18:34:07 GMT
Hi Ted

So would the tuple look like: (x._1, split.startIndex + x._2 + x._1.length)
?

On Tue, Jun 28, 2016 at 11:09 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Please take a look at:
> core/src/main/scala/org/apache/spark/rdd/ZippedWithIndexRDD.scala
>
> In compute() method:
>     val split = splitIn.asInstanceOf[ZippedWithIndexRDDPartition]
>     firstParent[T].iterator(split.prev, context).zipWithIndex.map { x =>
>       (x._1, split.startIndex + x._2)
>
> You can modify the second component of the tuple to take data.length into
> account.
>
> On Tue, Jun 28, 2016 at 10:31 AM, Punit Naik <naik.punit44@gmail.com>
> wrote:
>
>> Hi
>>
>> I wanted to change the functioning of the "zipWithIndex" function for
>> spark RDDs in which the output of the function is, just for an example,
>>  "(data, prev_index+data.length)" instead of "(data,prev_index+1)".
>>
>> How can I do this?
>>
>> --
>> Thank You
>>
>> Regards
>>
>> Punit Naik
>>
>
>


-- 
Thank You

Regards

Punit Naik

Mime
View raw message