spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lorenz Fischer <lorenz.fisc...@gmail.com>
Subject MLlib: Anybody working on hierarchical topic models like HLDA?
Date Wed, 03 Jun 2015 13:43:13 GMT
Hi All

I'm working on a project in which I use the current LDA implementation that
has been contributed by Databricks' Joseph Bradley et al. for the recent
1.3.0 release (thanks guys!). While this is great, my project requires
several levels of topics, as I would like to offer users to drill down into
subtopics.

As I understand it, Hierarchical Latent Dirichlet Allocation (HLDA) would
offer such a hierarchy. Looking at the papers and talks by Blei [1,2] and
Jordan [3], I think I should be able to implement HLDA in Spark using the
Nested Chinese Restaurant Process (NCRP). However, as I have some time
constraints, I'm not sure if I will have the time to do it 'the proper way'.

In any case, I wanted to quickly ask around if anybody is already working
on this or on some other form of a hierarchical topic model. Maybe I could
contribute to these efforts instead of starting from scratch.

Best,
Lorenz

[1] http://www.cs.princeton.edu/~blei/papers/BleiGriffithsJordan2009.pdf
[2]
http://papers.nips.cc/paper/2466-hierarchical-topic-models-and-the-nested-chinese-restaurant-process.pdf
[3] https://www.youtube.com/watch?v=PxgW3lOrj60

Mime
View raw message