spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sanjay Subramanian <sanjaysubraman...@yahoo.com.INVALID>
Subject Re: Extracting values from a Collecion
Date Sat, 22 Nov 2014 23:17:16 GMT
I could not iterate thru the set but changed the code to get what I was looking for(Not elegant
but gets me going)
package org.medicalsidefx.common.utils

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._

import scala.collection.mutable.ArrayBuffer

/**
 * Created by sansub01 on 11/19/14.
 */
object TwoWayJoin2 {
  def main(args: Array[String]) {
    if (args.length < 2) {
      System.err.println("Usage: TwoWayJoinCount <file1>   <file2>")
      System.exit(12)
    }

    val sconf = new SparkConf().setMaster("local").setAppName("MedicalSideFx-TwoWayJoin")

    val sc = new SparkContext(sconf)

    val file1 = args(0)
    val file2 = args(1)

    val file1Rdd = sc.textFile(file1).map(x => (x.split(",")(0), x.split(",")(1)))
    val file2Rdd = sc.textFile(file2).map(x => (x.split(",")(0), x.split(",")(1))).reduceByKey((v1,v2)
=> v1+"|"+v2)

    file1Rdd.collect().foreach(println)
    file2Rdd.collect().foreach(println)

    file1Rdd.join(file2Rdd).collect().foreach( e => println(e.toString.replace("(","").replace(")","")))

  }
}

      From: Jey Kottalam <jey@cs.berkeley.edu>
 To: Sanjay Subramanian <sanjaysubramanian@yahoo.com> 
Cc: Arun Ahuja <aahuja11@gmail.com>; Andrew Ash <andrew@andrewash.com>; user <user@spark.apache.org>

 Sent: Friday, November 21, 2014 10:07 PM
 Subject: Extracting values from a Collecion
   
Hi Sanjay,

These are instances of the standard Scala collection type "Set", and its documentation can
be found by googling the phrase "scala set".

Hope that helps,
-Jey



On Fri, Nov 21, 2014 at 10:41 AM, Sanjay Subramanian <sanjaysubramanian@yahoo.com.invalid>
wrote:
> hey guys
>
> names.txt
> =========
> 1,paul
> 2,john
> 3,george
> 4,ringo
>
>
> songs.txt
> =========
> 1,Yesterday
> 2,Julia
> 3,While My Guitar Gently Weeps
> 4,With a Little Help From My Friends
> 1,Michelle
> 2,Nowhere Man
> 3,Norwegian Wood
> 4,Octopus's Garden
>
> What I want to do is real simple
>
> Desired Output
> ==============
> (4,(With a Little Help From My Friends, Octopus's Garden))
> (2,(Julia, Nowhere Man))
> (3,(While My Guitar Gently Weeps, Norwegian Wood))
> (1,(Yesterday, Michelle))
>
>
> My Code
> =======
> val file1Rdd =
> sc.textFile("/Users/sansub01/mycode/data/songs/names.txt").map(x =>
> (x.split(",")(0), x.split(",")(1)))
> val file2Rdd =
> sc.textFile("/Users/sansub01/mycode/data/songs/songs.txt").map(x =>
> (x.split(",")(0), x.split(",")(1)))
> val file2RddGrp = file2Rdd.groupByKey()
> file2Rdd.groupByKey().mapValues(names =>
> names.toSet).collect().foreach(println)
>
> Result
> =======
> (4,Set(With a Little Help From My Friends, Octopus's Garden))
> (2,Set(Julia, Nowhere Man))
> (3,Set(While My Guitar Gently Weeps, Norwegian Wood))
> (1,Set(Yesterday, Michelle))
>
>
> How can I extract values from the Set ?
>
> Thanks
>
> sanjay
>



  
Mime
View raw message