spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Kumar <>
Subject RDD with a Map
Date Tue, 03 Jun 2014 21:56:29 GMT
Hi Folks,

I am new to spark -and this is probably a basic question.

I have a file on the hdfs

1, one
1, uno
2, two
2, dos

I want to create a multi Map RDD  RDD[Map[String,List[String]]]

{"1"->["one","uno"], "2"->["two","dos"]}

First I read the file
val identityData:RDD[String] = sc.textFile($path_to_the_file, 2).cache()

val identityDataList:RDD[List[String]]={ line =>
        val splits= line.split(",")

Then I group them by the first element

 val grouped:RDD[(String,Iterable[List[String]])]=
      element =>{

Then I do the equivalent of mapValues of scala collections to get rid of
the first element

 val groupedWithValues:RDD[(String,List[String])] =
    grouped.flatMap[(String,List[String])]{ case (key,list)=>{
      List((key,{element => {

for this to actually materialize I do collect

 val groupedAndCollected=groupedWithValues.collect()

I get an Array[String,List[String]].

I am trying to figure out if there is a way for me to get
Map[String,List[String]] (a multimap), or to create an
RDD[Map[String,List[String]] ]

I am sure there is something simpler, I would appreciate advice.

Many thanks,

View raw message