spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reza Zadeh <r...@databricks.com>
Subject Re: Column Similarities using DIMSUM fails with GC overhead limit exceeded
Date Mon, 02 Mar 2015 04:00:19 GMT
Hi Sab,
In this dense case, the output will contain 10000 x 10000 entries, i.e. 100
million doubles, which doesn't fit in 1GB with overheads.
For a dense matrix, similarColumns() scales quadratically in the number of
columns, so you need more memory across the cluster.
Reza


On Sun, Mar 1, 2015 at 7:06 PM, Sabarish Sasidharan <
sabarish.sasidharan@manthan.com> wrote:

> Sorry, I actually meant 30 x 10000 matrix (missed a 0)
>
>
> Regards
> Sab
>
>

Mime
View raw message