spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manoj Samel <manojsamelt...@gmail.com>
Subject Re: How to create RDD over hashmap?
Date Fri, 24 Jan 2014 21:11:37 GMT
Yes, that works.

But then the hashmap functionality of the fast key lookup etc. is gone and
the search will be linear using a iterator etc. Not sure if Spark
internally creates additional optimizations for Seq but otherwise one has
to assume this becomes a List/Array without a fast key lookup of a hashmap
or b-tree

Any thoughts ?





On Fri, Jan 24, 2014 at 1:00 PM, Frank Austin Nothaft <fnothaft@berkeley.edu
> wrote:

> Manoj,
>
> I assume you’re trying to create an RDD[(String, Double)]? Couldn’t you
> just do:
>
> val cr_rdd = sc.parallelize(cr.toSeq)
>
> The toSeq would convert the HashMap[String,Double] into a Seq[(String,
> Double)] before calling the parallelize function.
>
> Regards,
>
> Frank Austin Nothaft
> fnothaft@berkeley.edu
> fnothaft@eecs.berkeley.edu
> 202-340-0466
>
> On Jan 24, 2014, at 12:56 PM, Manoj Samel <manojsameltech@gmail.com>
> wrote:
>
> > Is there a way to create RDD over a hashmap ?
> >
> > If I have a hash map and try sc.parallelize, it gives
> >
> > <console>:17: error: type mismatch;
> >  found   : scala.collection.mutable.HashMap[String,Double]
> >  required: Seq[?]
> > Error occurred in an application involving default arguments.
> >        val cr_rdd = sc.parallelize(cr)
> >                                    ^
>
>

Mime
View raw message