spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stéphane Verlet <kaweahsoluti...@gmail.com>
Subject Re: rdd split into new rdd
Date Wed, 23 Dec 2015 20:11:58 GMT
I use Scala , but I guess in Java  code  would like this

JavaPairRDD<String, TreeMap<String, Integer>> rdd ...

JavaPairRDD<String, List<String>> rdd2 = rdd.mapPartitionsToPair(function ,
true)

where function implements
   PairFlatMapFunction<java.util.Iterator<TreeMap<String, Integer>>,String,
List<String>>

Iterable<scala.Tuple2<String,List<String>>>
call(java.util.Iterator<TreeMap<String, Integer>> Ite){

// Iterate over your treemaps and generate your lists

}






On Wed, Dec 23, 2015 at 10:49 AM, Yasemin Kaya <godot85@gmail.com> wrote:

> How can i use mapPartion? Could u give me an example?
>
> 2015-12-23 17:26 GMT+02:00 Stéphane Verlet <kaweahsolutions@gmail.com>:
>
>> You should be able to do that using mapPartition
>>
>> On Wed, Dec 23, 2015 at 8:24 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>>> bq. {a=1, b=1, c=2, d=2}
>>>
>>> Can you elaborate your criteria a bit more ? The above seems to be a
>>> Set, not a Map.
>>>
>>> Cheers
>>>
>>> On Wed, Dec 23, 2015 at 7:11 AM, Yasemin Kaya <godot85@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have data
>>>> *JavaPairRDD<String, TreeMap<String, Integer>> *format. In example:
>>>>
>>>> *(1610, {a=1, b=1, c=2, d=2}) *
>>>>
>>>> I want to get
>>>> *JavaPairRDD<String, List<String>>* In example:
>>>>
>>>>
>>>> *(1610, {a, b})*
>>>> *(1610, {c, d})*
>>>>
>>>> Is there a way to solve this problem?
>>>>
>>>> Best,
>>>> yasemin
>>>> --
>>>> hiç ender hiç
>>>>
>>>
>>>
>>
>
>
> --
> hiç ender hiç
>

Mime
View raw message