spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Startin <>
Subject Re: ToLocalIterator vs collect
Date Thu, 05 Jan 2017 10:51:48 GMT
Why not do that with spark sql to utilise the executors properly, rather than a sequential
filter on the driver.

Select * from A left join B on = where is NULL limit k

If you were sorting just so you could iterate in order, this might save you a couple of sorts

> On 5 Jan 2017, at 10:40, Rohit Verma <> wrote:
> Hi all,
> I am aware that collect will return a list aggregated on driver, this will return OOM
when we have a too big list.
> Is toLocalIterator safe to use with very big list, i want to access all values one by
> Basically the goal is to compare two sorted rdds (A and B) to find top k entries missed
in B but there in A 
> Rohit
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

To unsubscribe e-mail:

View raw message