spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit <kumarami...@gmail.com>
Subject Re: RDD with a Map
Date Wed, 04 Jun 2014 15:10:43 GMT
Yes, RDD as a map of String keys and List of string as values.

Amit

On Jun 4, 2014, at 2:46, Oleg Proudnikov <oleg.proudnikov@gmail.com> wrote:

> Just a thought... Are you trying to use use the RDD as a Map?
> 
> 
> 
> On 3 June 2014 23:14, Doris Xin <doris.s.xin@gmail.com> wrote:
> Hey Amit,
> 
> You might want to check out PairRDDFunctions. For your use case in particular, you can
load the file as a RDD[(String, String)] and then use the groupByKey() function in PairRDDFunctions
to get an RDD[(String, Iterable[String])].
> 
> Doris
> 
> 
> On Tue, Jun 3, 2014 at 2:56 PM, Amit Kumar <kumaramit01@gmail.com> wrote:
> Hi Folks,
> 
> I am new to spark -and this is probably a basic question.
> 
> I have a file on the hdfs
> 
> 1, one
> 1, uno
> 2, two
> 2, dos
> 
> I want to create a multi Map RDD  RDD[Map[String,List[String]]]
> 
> {"1"->["one","uno"], "2"->["two","dos"]}
> 
> 
> First I read the file 
> val identityData:RDD[String] = sc.textFile($path_to_the_file, 2).cache()
> 
> val identityDataList:RDD[List[String]]=
>       identityData.map{ line =>
>         val splits= line.split(",")
>         splits.toList
>     }
> 
> Then I group them by the first element
> 
>  val grouped:RDD[(String,Iterable[List[String]])]=
>     songArtistDataList.groupBy{
>       element =>{
>         element(0)
>       }
>     }
> 
> Then I do the equivalent of mapValues of scala collections to get rid of the first element
> 
>  val groupedWithValues:RDD[(String,List[String])] =
>     grouped.flatMap[(String,List[String])]{ case (key,list)=>{
>       List((key,list.map{element => {
>         element(1)
>       }}.toList))
>     }
>     }
> 
> for this to actually materialize I do collect
> 
>  val groupedAndCollected=groupedWithValues.collect()
> 
> I get an Array[String,List[String]].
> 
> I am trying to figure out if there is a way for me to get Map[String,List[String]] (a
multimap), or to create an RDD[Map[String,List[String]] ]
> 
> 
> I am sure there is something simpler, I would appreciate advice.
> 
> Many thanks,
> Amit
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> Kind regards,
> 
> Oleg
> 

Mime
View raw message