spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yang, Yuhao" <>
Subject RE: MLlib: Anybody working on hierarchical topic models like HLDA?
Date Thu, 04 Jun 2015 03:13:18 GMT
Hi Lorenz,

  I’m trying to build a prototype of HDP for a customer based on the current LDA implementations.
An initial version will probably be ready within the next one or two weeks. I’ll share it
and hopefully we can join forces.

  One concern is that I’m not sure how widely it will be used in the industry or community.
Hope it’s popular enough to be accepted by Spark MLlib.


From: Joseph Bradley []
Sent: Thursday, June 4, 2015 7:17 AM
To: Lorenz Fischer
Subject: Re: MLlib: Anybody working on hierarchical topic models like HLDA?

Hi Lorenz,

I'm not aware of people working on hierarchical topic models for MLlib, but that would be
cool to see.  Hopefully other devs know more!

Glad that the current LDA is helpful!


On Wed, Jun 3, 2015 at 6:43 AM, Lorenz Fischer <<>>
Hi All

I'm working on a project in which I use the current LDA implementation that has been contributed
by Databricks' Joseph Bradley et al. for the recent 1.3.0 release (thanks guys!). While this
is great, my project requires several levels of topics, as I would like to offer users to
drill down into subtopics.

As I understand it, Hierarchical Latent Dirichlet Allocation (HLDA) would offer such a hierarchy.
Looking at the papers and talks by Blei [1,2] and Jordan [3], I think I should be able to
implement HLDA in Spark using the Nested Chinese Restaurant Process (NCRP). However, as I
have some time constraints, I'm not sure if I will have the time to do it 'the proper way'.

In any case, I wanted to quickly ask around if anybody is already working on this or on some
other form of a hierarchical topic model. Maybe I could contribute to these efforts instead
of starting from scratch.



View raw message