mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paritosh Ranjan <pran...@xebia.com>
Subject Re: Cluster dumper crashes when run on a large dataset
Date Fri, 04 Nov 2011 05:57:07 GMT
Reducing dimension (drastically, try less than 100 if functionality 
allows this) can be a solution.

Which vector implementation are you using? If the vectors are sparsely 
populated ( have lots of uninitialized/unused dimensions) , you can use 
RandomAccessSparseVector or SequentialAccessSparseVector, which will 
populate only the dimensions which you are using. This can also decrease 
memory consumption.

On 04-11-2011 11:19, gaurav redkar wrote:
> Hi,
>
> yes Paritosh..even i think the same. actually i am using a test data set
> that has 5000 tuples with 1000 dimensions each.  the thing is der are too
> many files created in the pointsDir folder and i think the program tries to
> open a path to all d files(i.e. read all the files in memory at once). Is
> my interpretation correct.?? Also how do i go about fixing it..?
>
> Thanks
>
>
>
> On Fri, Nov 4, 2011 at 11:03 AM, Paritosh Ranjan<pranjan@xebia.com>  wrote:
>
>> Reading point is keeping everything in memory which might have crashed it.
>> pointList.add(record.**getSecond());
>>
>> Your dataset size is 40 MB but the vectors might be too large. How many
>> dimensions are you having in your Vector?
>>
>>
>> On 04-11-2011 10:57, gaurav redkar wrote:
>>
>>> Hello,
>>>
>>> I am in  a fix with the Clusterdumper utility. The clusterdump utility
>>> crashes when it tries to output the clusters by outputting an out of
>>> memory
>>> exception: java heap space.
>>>
>>> when i checked the error stack, it seems that the program crashed in
>>> readPoints() function. i guess it is unable to build the "result" map. Any
>>> idea how do i fix this.??
>>>
>>> I am working on a dataset of size 40mb. I had tried increaseing the heap
>>> space but with no luck.
>>>
>>> Thanks
>>>
>>> Gaurav
>>>
>>>
>>>
>>> -----
>>> No virus found in this message.
>>> Checked by AVG - www.avg.com
>>> Version: 10.0.1411 / Virus Database: 2092/3994 - Release Date: 11/03/11
>>>
>>
>
>
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1411 / Virus Database: 2092/3994 - Release Date: 11/03/11


Mime
View raw message