mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Cluster text docs
Date Fri, 18 Dec 2009 18:06:42 GMT
I don't know of any benchmarks other than what David H. has run.  Would be good to get some
setup (as with all the Mahout algorithms, actually).


On Dec 18, 2009, at 10:03 AM, Levy, Mark wrote:

> Hi Drew,
> 
> Below is a mail I sent to this list a while back.  Is this consistent with your experience?
> 
> Cheers,
> 
> Mark
> 
> 
> On Sep 23, 2009, at 6:05 AM, Levy, Mark wrote:
> 
>> I've started to experiment with LDA and am finding that it creates  
>> only
>> a single long-running map task for each iteration, which doesn't scale
>> well.  The map is taking 20mins for 10k of my input SparseVectors,  
>> and 5
>> hours for 100k (the vocabulary size also grows when there are more
>> vectors).
>> 
>> Is this expected or am I doing something wrong?  Are there any  
>> existing
>> performance benchmarks?
>> 
> 
> 
>> -----Original Message-----
>> From: Drew Farris [mailto:drew.farris@gmail.com]
>> Sent: 18 December 2009 13:59
>> To: mahout-user@lucene.apache.org
>> Subject: Re: Cluster text docs
>> 
>> Hi Shashi,
>> 
>> On Fri, Dec 18, 2009 at 1:36 AM, Shashikant Kore <shashikant@gmail.com>
>> wrote:
>> 
>>> (.. cluster assignment is already there. Wonder why you had to redo
>>> it.)
>> 
>> Ahh, yes. I didn't have to re-do it, but I did wanted to learn the
>> internal structure of the data files and to point out that it was easy
>> enough to achieve. The code is quite straightforward.
>> 
>>> Drew, are you using the latest code? Overnight sounds too long.
>> 
>> That's good to know. This was a couple month or two ago before the
>> matrix/math stuff was rolled in. I'll collect exact times on the next
>> run I do.
>> 
>> Has anyone else run LDA outside of the canned Reuters example? I would
>> be interested to hear about corpus characteristics and processing
>> power required to successfully produce LDA clusters. I've had all
>> sorts of issues, but mostly related to hadoop configuration nits
>> related to my environment however
>> 
>> Drew

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message