Yes. It would be possible to build a nonparametric implementation.
Probably simpler is to overprovision the number of
topics and apply some forceful regularization. This is much like what we do
with our Dirichlet Process clustering.
On Fri, Mar 25, 2011 at 8:14 AM, Vasil Vasilev <vavasilev@gmail.com> wrote:
> Hi David,
>
> Doesn't the fact that the alpha parameters are not learned make the
> algorithm very dependent on the initial number of topics that is provided
> by
> the user (k <numTopics>), i.e. it could learn how the words are
> distributed
> by topics, but it cannot learn the correct number of topics. May be
> approach
> similar to the one implemented in the Dirichlet algorithm could be used,
> which has initial prior alpha and then the number of "meaningful" topics is
> refined depending on how many words each topic has collected (i.e. the less
> words a topic has attracted the less probable this topic becomes as whole).
>
> Regards, Vasil
>
> On Thu, Mar 10, 2011 at 8:40 PM, David Hall <dlwh@cs.berkeley.edu> wrote:
>
> > err, Jae, sorry.
> >
> >  David
> >
> > On Thu, Mar 10, 2011 at 10:33 AM, David Hall <dlwh@cs.berkeley.edu>
> wrote:
> > > Hi Bae,
> > >
> > > We only try to obtain MLE's of p(wordtopic) (beta), and we treat
> > > alpha and eta as fixed. As you say, those could be learned, and it
> > > might improve performance, but it's just not implemented.
> > >
> > > There's no particular reason they're not implemented, but they're not
> > > critical to getting basic LDA working, especially MAP estimation of
> > > \beta.
> > >
> > >  David
> > >
> > > On Wed, Mar 9, 2011 at 10:28 PM, Bae, Jae Hyeon <metacret@gmail.com>
> > wrote:
> > >> Hi
> > >>
> > >> I am studying LDA algorithm for my statistics project. The goal is
> fully
> > >> understanding LDA algorithms and statistical concepts behind that and
> > >> analyze implementation. I've chosen Mahout LDA implementation because
> > it's
> > >> scalable and welldocumented.
> > >>
> > >> According to the original paper written by Blei, Ng, Jordan,
> > >> parameters(alpha, beta) would be estimated with variational EM method.
> > But I
> > >> can't find any numerical methods to optimize those parameters. In
> Mahout
> > >> implementation, alpha is topic smoothing input by user, beta is just
> > >> P(wordtopic), not estimated.
> > >>
> > >> I think that this implementation has a basic assumption. I want to
> know
> > >> whether there was specific reason to implement like this without
> > parameter
> > >> estimation.
> > >>
> > >> Thank you
> > >>
> > >> Best, Jay
> > >>
> > >
> >
>
