mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yahia Zakaria <yahiawestl...@gmail.com>
Subject Re: Mahout SSVD is too slow for highly dimensional data
Date Mon, 10 Jun 2013 12:47:38 GMT
Yes, I have tuned the number of reducers, the best choice based on my
cluster is 56 reducers.


On Mon, Jun 10, 2013 at 3:39 PM, Sebastian Schelter <ssc@apache.org> wrote:

> Did you tune the number of reducers? I successfully applied ssvd to a
> dataset with 3B nonzeros on 6 machines in a few hours.
> Am 10.06.2013 14:32 schrieb "Yahia Zakaria" <yahiawestlife@gmail.com>:
>
> > Hi All
> >
> > I am running Mahout SSVD (trunk version) using pca option on Bag of Words
> > dataset (http://archive.ics.uci.edu/ml/datasets/Bag+of+Words). This
> > dataset
> > have 8000000 instances (rows) and 100000 attributes (columns). Mahout
> SSVD
> > is too slow, it may take days to finish the first phase of SSVD (Q-Job)
> . I
> > am running the code on a cluster of 16 machines, each one is 8 cores and
> 32
> > GB memory. Moreover, the CPU and memory of the workers are not utilized
> at
> > all. While running Mahout SSVD on smaller dataset (12500 rows and 5000
> > columns), it runs too fast, the job was finished in 2 minutes. Do you
> have
> > any idea why Mahout SSVD is too slow for high dimensional data ? and to
> > what extent that SSVD can work efficiently (with respect to the number of
> > rows and columns of the input matrix) ?
> >
> > Thanks
> > Yehia
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message