mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Jones <>
Subject Re: newbie question: LSA anaylsis + others
Date Thu, 18 Jun 2009 15:18:24 GMT
Okay I have brain freeze, reading the email below:-)

I think PLSI will do (or is a great starter) to what I want. I am looking at a hadoop install,
with mahout on top, is there any need of lucene.

Also is there a "dummies" guide to all these algos, i.e which are clustering algos, which
are indexing, which are for "abc", since I am reading a ton of information and am not 100%
sure of which categories they all fit into....hope the question is not to vague


From: Ted Dunning <>
Sent: Wednesday, 17 June, 2009 7:36:48
Subject: Re: newbie question: LSA anaylsis + others

Indeed there is.  And Prasenjit is being properly modest by not pointing out
that this was due to his efforts.

This is a great example of how terse a language like pig can make many
problems that involve a bunch of counting.  Most EM-like algorithms fit into
this category including k-means, HMM fitting, Dirichlet Process mixture
modeling and lots of others.

The problem in my mind is that it is difficult to tie all of the little
scripts together coherenly.  Prasenjit did this using python, but there is
still no cohesive whole to the resulting program even if the result is much
smaller and probably easier to understand than a large java program.

On Tue, Jun 16, 2009 at 11:07 PM, prasenjit mukherjee

> Well, there is a  PLSI implementation using Pig ( over Hadoop ) as a mahout
> patch :

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message