mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gaurav redkar <gauravred...@gmail.com>
Subject Re: Cluster dumper crashes when run on a large dataset
Date Fri, 04 Nov 2011 05:49:31 GMT
Hi,

yes Paritosh..even i think the same. actually i am using a test data set
that has 5000 tuples with 1000 dimensions each.  the thing is der are too
many files created in the pointsDir folder and i think the program tries to
open a path to all d files(i.e. read all the files in memory at once). Is
my interpretation correct.?? Also how do i go about fixing it..?

Thanks



On Fri, Nov 4, 2011 at 11:03 AM, Paritosh Ranjan <pranjan@xebia.com> wrote:

> Reading point is keeping everything in memory which might have crashed it.
> pointList.add(record.**getSecond());
>
> Your dataset size is 40 MB but the vectors might be too large. How many
> dimensions are you having in your Vector?
>
>
> On 04-11-2011 10:57, gaurav redkar wrote:
>
>> Hello,
>>
>> I am in  a fix with the Clusterdumper utility. The clusterdump utility
>> crashes when it tries to output the clusters by outputting an out of
>> memory
>> exception: java heap space.
>>
>> when i checked the error stack, it seems that the program crashed in
>> readPoints() function. i guess it is unable to build the "result" map. Any
>> idea how do i fix this.??
>>
>> I am working on a dataset of size 40mb. I had tried increaseing the heap
>> space but with no luck.
>>
>> Thanks
>>
>> Gaurav
>>
>>
>>
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 10.0.1411 / Virus Database: 2092/3994 - Release Date: 11/03/11
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message