spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Boesch <java...@gmail.com>
Subject Re: distributing Scala Map datatypes to RDD
Date Mon, 13 Oct 2014 21:58:58 GMT
is the following what you are looking for?


scala >  sc.parallelize(myMap.map{ case (k,v) => (k,v) }.toSeq)
res2: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at
parallelize at <console>:21



2014-10-13 14:02 GMT-07:00 jon.g.massey <jon.g.massey@gmail.com>:

> Hi guys,
> Just starting out with Spark and following through a few tutorials, it
> seems
> the easiest way to get ones source data into an RDD is using the
> sc.parallelize function. Unfortunately, my local data is in multiple
> instances of Map<K,V> types, and the parallelize function only works on
> objects with the Seq trait, and produces an RDD which seemingly doesn't
> then
> have the notion of Keys and Values which I require for joins (amongst other
> functions).
>
> Is there a way of using a SparkContext to create a distributed RDD from a
> local Map, rather than from a Hadoop or text file source?
>
> Thanks,
> Jon
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/distributing-Scala-Map-datatypes-to-RDD-tp16320.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message