spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lihu <lihu...@gmail.com>
Subject Re: how to speed the count operation
Date Sat, 17 May 2014 05:02:11 GMT
Thanks very much. but I found it seems get stuck , not just slow.
when this phenomenon occurred, I found  the size of serialized data more
than 10MB.  this is similar to my another post: when the size more than
10MB, I increased the spark.akka.frameSize, it will not get stuck at this
step, but will lost executor with log: ERROR TaskSchedulerImpl: Lost
executor 20 on Husky002: remote Akka client disassociated etc


I do not know if this is caused by the incorrect configuration or it is a
bug in spark, or I missed something?



On Fri, May 16, 2014 at 7:55 AM, Xiangrui Meng <mengxr@gmail.com> wrote:

> count() triggers materialization. It computes the records and stores data
> in memory. That is why it is slow. -Xiangrui
>
>
> On Tue, May 13, 2014 at 10:25 PM, lihu <lihu723@gmail.com> wrote:
>
>> Hi,
>>     I used the MLlib of spark to run some experiments, such as the lasso,
>> and linear regression etc. I just use the given model in MLlib:
>> LassoWithSGD, LinearRegressionWithSGD. but I found that the count operation
>> is very slow , just as show below, it seems get stuck.
>>
>>
>>
>>
>>    My spark version is 0.9, memory of each node is 80G. the size of
>> dataset is about 8G., and I run in 30 nodes.
>>
>>    Any suggestion is thankful .
>>
>>
>>
>>
>

Mime
View raw message