spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Haviv <daniel.ha...@veracity-group.com>
Subject Re: aggregateByKey on PairRDD
Date Wed, 30 Mar 2016 10:58:43 GMT
Hi,
shouldn't groupByKey be avoided (
https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/best_practices/prefer_reducebykey_over_groupbykey.html)
?


Thank you,.
Daniel

On Wed, Mar 30, 2016 at 9:01 AM, Akhil Das <akhil@sigmoidanalytics.com>
wrote:

> Isn't it what tempRDD.groupByKey does?
>
> Thanks
> Best Regards
>
> On Wed, Mar 30, 2016 at 7:36 AM, Suniti Singh <suniti.singh@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I have an RDD having the data in  the following form :
>>
>> tempRDD: RDD[(String, (String, String))]
>>
>> (brand , (product, key))
>>
>> ("amazon",("book1","tech"))
>>
>> ("eBay",("book1","tech"))
>>
>> ("barns&noble",("book","tech"))
>>
>> ("amazon",("book2","tech"))
>>
>>
>> I would like to group the data by Brand and would like to get the result
>> set in the following format :
>>
>> resultSetRDD : RDD[(String, List[(String), (String)]
>>
>> i tried using the aggregateByKey but kind  of not getting how to achieve
>> this. OR is there any other way to achieve this?
>>
>> val resultSetRDD  = tempRDD.aggregateByKey("")({case (aggr , value) =>
>> aggr + String.valueOf(value) + ","}, (aggr1, aggr2) => aggr1 + aggr2)
>>
>> resultSetRDD = (amazon,("book1","tech"),("book2","tech"))
>>
>> Thanks,
>>
>> Suniti
>>
>
>

Mime
View raw message