mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Clustering performance
Date Thu, 02 Dec 2010 15:48:46 GMT
How many maps does Hadoop schedule?  If the number is small, then you need
to decrease the split size and make sure that your input file is splittable.

2010/12/2 Jure Jeseničnik <Jure.Jesenicnik@planet9.si>

> I have already explained my mission here:
>
>
> http://mail-archives.apache.org/mod_mbox/mahout-user/201011.mbox/%3C0EDE11E319B0B043B4F24E0305CABF7C80413134A4@P9MAIL.p9.internal%3E
>
>
>
> Using the trial & error method I’ve managed to found the most appropriate
> input parameters for canopy. That would be T1=1.4, T2=1.2 this gives me
> somewhere around 7000 clusters from 7800 input documents, which is exactly
> the result I’ve been looking for. I’m trying to put together the news from
> different sources that talk about the same story.
>
> What bothers me now is the performance. To complete this task of processing
> a 3.6 MB big file, on my pretty decent 4 core desktop machine,  mahout needs
> a good 14 minutes. I know I’m dealing with pretty large number of clusters
> but, but still. 14 minutes is a huge amount of time.  If I use a smaller
> amount of data (1700 docs) it is all over in less than a minute.
>
> When running locally, mahout was only consuming one cpu core? I’m running
> it on win 7 through  Cygwin, but it behaved pretty the same on some proper
> linux machines. How could I make it use all the available cpu power?
>
> I also tried running this  on a Hadoop cluster, but there seemed to be no
> significant improvement in time.  It seemed like  hadoop was unable to
> properly distribute such a small task.
>
> Is it possible that I missed something here.  What can I do to have this
> clustering finished in a bit more decent time.
>
>
>
> Thank you for your answers.
>
>
>
> Jure
>
>
>
>
>
>
>
> [image: logo-P9]
>
> *Planet 9 d.o.o.*
> Vojkova 78
> 1000 Ljubljana
> Slovenija
> -
> *Jure Jeseničnik*
> Razvijalec aplikacij / Applications developer
> jure.jesenicnik@planet9.si <jure.jesenicnikk@planet9.si>*
> T* + 386 47 30 375
> *F* + 386 1 47 28 550
> *M* + 386 41 363 586
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message