spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Buntain <cbunt...@cs.umd.edu>
Subject LDA and evaluating topic number
Date Thu, 28 Sep 2017 17:50:43 GMT
Hi, all!

	Is there an example somewhere on using LDA’s logPerplexity()/logLikelihood() functions
to evaluate topic counts? The existing MLLib LDA examples show calling them, but I can’t
find any documentation about how to interpret the outputs. Graphing the outputs for logs of
perplexity and likelihood aren’t consistent with what I expected (perplexity increases and
likelihood decreases as topics increase, which seem odd to me).

	An example of what I’m doing is here: http://www.cs.umd.edu/~cbuntain/FindTopicK-pyspark-regex.html
<http://www.cs.umd.edu/~cbuntain/FindTopicK-pyspark-regex.html>

	Thanks very much in advance! If I can figure this out, I can post example code online, so
others can see how this process is done.

-Best regards,
Cody
_________________
Cody Buntain, PhD
Postdoc, @UMD_CS
Intelligence Community Postdoctoral Fellow
cbuntain@cs.umd.edu <mailto:cbuntain@cs.umd.edu>
www.cs.umd.edu/~cbuntain <http://www.cs.umd.edu/~cbuntain>

Mime
View raw message