spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jem Tucker <jem.tuc...@gmail.com>
Subject Re: RDD from partitions
Date Fri, 28 Aug 2015 15:25:26 GMT
Hey Rishitesh,

Thats perfect thanks so much! Dont know why i didnt think of using
mapPartitions like this

Thanks,

Jem

On Fri, Aug 28, 2015 at 10:35 AM Rishitesh Mishra <rishi80.mishra@gmail.com>
wrote:

> Hi Jem,
> A simple way to get this is to use MapPartitionedRDD. Please see the below
> code. For this you need to know your parent RDD's partition numbers that
> you want to exclude.  One drawback here is the new RDD will also invoke
> similar number of tasks as parent RDDs as both the RDDs have same number of
> partitions. We only be excluding the results from certain partitions. If
> you can live with that , then its OK.
>
> val ones = sc.makeRDD(1 to 100, 10).map(x => x) // base RDD
>
> // Reduced RDD
> val reduced = ones.mapPartitions { iter => {
>
>   new Iterator[Int](){
>     override def hasNext: Boolean = {
>       if(Seq(0,1,2).contains(TaskContext.get().partitionId)) {
>         false
>     } else{
>         iter.hasNext
>       }
>     }
>
>     override def next():Int = iter.next()
>   }
>
> }
> }.collect().foreach(println)
>
>
>
>
> On Fri, Aug 28, 2015 at 12:33 PM, Jem Tucker <jem.tucker@gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to create an RDD from a selected number of its parents
>> partitions. My current approach is to create my own SelectedPartitionRDD
>> and implement compute and numPartitions myself, problem is the compute
>> method is marked as @developerApi, and hence unsuitable for me to be using
>> in my application. Are there any alternative methods that will only use the
>> stable parts of the spark API?
>>
>> Thanks,
>>
>> Jem
>>
>
>

Mime
View raw message