spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Why is RDD lookup slow?
Date Thu, 19 Feb 2015 15:33:52 GMT
RDDs are not Maps. lookup() does a linear scan -- parallel by
partition, but stil linear. Yes, it is not supposed be an O(1) lookup
data structure. It'd be much nicer to broadcast the relatively small
data set as a Map and look it up fast, locally.

On Thu, Feb 19, 2015 at 3:29 PM, shahab <> wrote:
> Hi,
> I am doing lookup on cached RDDs [(Int,String)], and I noticed that the
> lookup is relatively slow 30-100 ms ?? I even tried this on one machine with
> single partition, but no difference!
> The RDDs are not large at all, 3-30 MB.
> Is this expected behaviour? should I use other data structures, like HashMap
> to keep data and look up it there and use Broadcast to send a copy to all
> machines?
> best,
> /Shahab

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message