Yes. It would be possible to build a non-parametric implementation.
Probably simpler is to over-provision the number of
topics and apply some forceful regularization. This is much like what we do
with our Dirichlet Process clustering.
On Fri, Mar 25, 2011 at 8:14 AM, Vasil Vasilev wrote:
> Hi David,
>
> Doesn't the fact that the alpha parameters are not learned make the
> algorithm very dependent on the initial number of topics that is provided
> by
> the user (-k ), i.e. it could learn how the words are
> distributed
> by topics, but it cannot learn the correct number of topics. May be
> approach
> similar to the one implemented in the Dirichlet algorithm could be used,
> which has initial prior alpha and then the number of "meaningful" topics is
> refined depending on how many words each topic has collected (i.e. the less
> words a topic has attracted the less probable this topic becomes as whole).
>
> Regards, Vasil
>
> On Thu, Mar 10, 2011 at 8:40 PM, David Hall wrote:
>
> > err, Jae, sorry.
> >
> > -- David
> >
> > On Thu, Mar 10, 2011 at 10:33 AM, David Hall
> wrote:
> > > Hi Bae,
> > >
> > > We only try to obtain MLE's of p(word|topic) (beta), and we treat
> > > alpha and eta as fixed. As you say, those could be learned, and it
> > > might improve performance, but it's just not implemented.
> > >
> > > There's no particular reason they're not implemented, but they're not
> > > critical to getting basic LDA working, especially MAP estimation of
> > > \beta.
> > >
> > > -- David
> > >
> > > On Wed, Mar 9, 2011 at 10:28 PM, Bae, Jae Hyeon
> > wrote:
> > >> Hi
> > >>
> > >> I am studying LDA algorithm for my statistics project. The goal is
> fully
> > >> understanding LDA algorithms and statistical concepts behind that and
> > >> analyze implementation. I've chosen Mahout LDA implementation because
> > it's
> > >> scalable and well-documented.
> > >>
> > >> According to the original paper written by Blei, Ng, Jordan,
> > >> parameters(alpha, beta) would be estimated with variational EM method.
> > But I
> > >> can't find any numerical methods to optimize those parameters. In
> Mahout
> > >> implementation, alpha is topic smoothing input by user, beta is just
> > >> P(word|topic), not estimated.
> > >>
> > >> I think that this implementation has a basic assumption. I want to
> know
> > >> whether there was specific reason to implement like this without
> > parameter
> > >> estimation.
> > >>
> > >> Thank you
> > >>
> > >> Best, Jay
> > >>
> > >
> >
>