spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <>
Subject Re: Efficient look up in Key Pair RDD
Date Mon, 09 Jan 2017 01:17:26 GMT
To start with caching and having a known partioner will help a bit, then
there is also the IndexedRDD project, but in general spark might not be the
best tool for the job.  Have you considered having Spark output to
something like memcache?

What's the goal of you are trying to accomplish?

On Sun, Jan 8, 2017 at 5:04 PM Anil Langote <>

> Hi All,
> I have a requirement where I wanted to build a distributed HashMap which
> holds 10M key value pairs and provides very efficient lookups for each key.
> I tried loading the file into JavaPairedRDD and tried calling lookup method
> its very slow.
> How can I achieve very very faster lookup by a given key?
> Thank you
> Anil Langote

View raw message