spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wenlei Xie <wenlei....@gmail.com>
Subject Re: Getting the partition position of cached RDD?
Date Mon, 02 Sep 2013 08:10:06 GMT
Thank you! It's a very nice improvement :).

However, my situation is a bit different -- the code their tries to make
each coalesced partition to have roughly same * number of parent
partitions* , while in my case, the parent partitions could be quite
imbalanced and I am trying to to make each coalesced partition to have
roughly the same * SIZE *.

Of course, this requires the size of parent partitions to be known -- which
is not a problem in my case as I would always generate it and cache it.
This is probably not a common case thus I am happy to write my own
(hacking) code to get it around -- but I need the location for each cached
partitions...

By the way: Is it possible to assign preferred locations to
ParallelCollectionRDD? (e.g. RDDs generated by sc.parallize).. Sorry if it
is a silly question...

Best,
Wenlei



On Mon, Sep 2, 2013 at 12:28 AM, Reynold Xin <rxin@cs.berkeley.edu> wrote:

> Does this help you? https://github.com/mesos/spark/pull/832
>
>
> --
> Reynold Xin, AMPLab, UC Berkeley
> http://rxin.org
>
>
>
> On Mon, Sep 2, 2013 at 3:24 PM, Wenlei Xie <wenlei.xie@gmail.com> wrote:
>
>> Hi,
>>
>> I am wondering if it is possible to get the partition position of cached
>> RDD? I am asking this because I am trying to avoid shuffling when
>> performing coalesce operation. And the size of my partitions could be quite
>> imbalance thus CoalescedRDD would probably not be a good solution in my
>> case.
>>
>> Thank you!
>>
>> Best,
>> Wenlei
>>
>> --
>> Wenlei Xie (谢文磊)
>>
>> Department of Computer Science
>> 5132 Upson Hall, Cornell University
>> Ithaca, NY 14853, USA
>> Phone: (607) 255-5577
>> Email: wenlei.xie@gmail.com
>>
>
>


-- 
Wenlei Xie (谢文磊)

Department of Computer Science
5132 Upson Hall, Cornell University
Ithaca, NY 14853, USA
Phone: (607) 255-5577
Email: wenlei.xie@gmail.com

Mime
View raw message