mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Transposing a matrix is limited by how large a node is.
Date Fri, 06 May 2011 13:54:16 GMT
On Fri, May 6, 2011 at 6:01 AM, Vincent Xue <xue.vin@gmail.com> wrote:

> Dear Mahout Users,
>
> I am using Mahout-0.5-SNAPSHOT to transpose a dense matrix of 55000 x
> 31000.
> My matrix is in stored on the HDFS as a
> SequenceFile<IntWritable,VectorWritable>, consuming just about 13 GB. When
> I
> run the transpose function on my matrix, the function falls over during the
> reduce phase. With closer inspection, I noticed that I was receiving the
> following error:
>
> FSError: java.io.IOException: No space left on device
>
> I thought this was not possible considering that I was only using 15% of
> the
> 2.5 TB in the cluster but when I closely monitored the disk space, it was
> true that the 40 GB hard drive on the node was running out of space.
> Unfortunately, all of my nodes are limited to 40 GB and I have not been
> successful in transposing my matrix.
>

Running HDFS with nodes with only 40GB of hard disk each is a recipe
for disaster, IMO.  There are lots of temporary files created by map/reduce
jobs, and working on an input file of size 13GB you're bound to run into
this.

Can you show us what your job tracker says the amount of
HDFS_BYTES_WRITTEN (and other similar numbers) during your job?


> From this observation, I would like to know if there is any alternative
> method to transpose my matrix or if there is something I am missing?


Do you have a server with 26GB of RAM lying around somewhere?
You could do it on one machine without hitting disk. :)

  -jake

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message