spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: PairRDD's lookup method Performance
Date Fri, 19 Sep 2014 10:07:14 GMT
The product of each mapPartitions call can be an Iterable of one big Map.
You still need to write some extra custom code like what lookup() does to
exploit this data structure.
On Sep 18, 2014 11:07 PM, "Harsha HN" <99harsha.h.n99@gmail.com> wrote:

> Hi All,
>
> My question is related to improving performance of pairRDD's lookup
> method. I went through below link where "Tathagata Das
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=46>"
 explains
> creating Hash Map over Partitions using "mappartition" method to get search
> performance of O(1).
>
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-RDD-over-hashmap-td893.html
>
> How can this be done in Java? HashMap is not a supported return type for
> any overloaded version of "mappartition" methods.
>
> Thanks and Regards,
> Harsha
>

Mime
View raw message