spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: how to speed the count operation
Date Thu, 15 May 2014 23:55:59 GMT
count() triggers materialization. It computes the records and stores data
in memory. That is why it is slow. -Xiangrui


On Tue, May 13, 2014 at 10:25 PM, lihu <lihu723@gmail.com> wrote:

> Hi,
>     I used the MLlib of spark to run some experiments, such as the lasso,
> and linear regression etc. I just use the given model in MLlib:
> LassoWithSGD, LinearRegressionWithSGD. but I found that the count operation
> is very slow , just as show below, it seems get stuck.
>
>
>
>
>    My spark version is 0.9, memory of each node is 80G. the size of
> dataset is about 8G., and I run in 30 nodes.
>
>    Any suggestion is thankful .
>
>
>
>

Mime
View raw message