spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Boesch <java...@gmail.com>
Subject Re: LDA and evaluating topic number
Date Thu, 07 Dec 2017 08:15:59 GMT
I have been testing on the 20 NewsGroups dataset - which the Spark docs
themselves reference.  I can confirm that perplexity increases and
likelihood decreases as topics increase - and am similarly confused by
these results.

2017-09-28 10:50 GMT-07:00 Cody Buntain <cbuntain@cs.umd.edu>:

> Hi, all!
>
> Is there an example somewhere on using LDA’s logPerplexity()/logLikelihood()
> functions to evaluate topic counts? The existing MLLib LDA examples show
> calling them, but I can’t find any documentation about how to interpret the
> outputs. Graphing the outputs for logs of perplexity and likelihood aren’t
> consistent with what I expected (perplexity increases and likelihood
> decreases as topics increase, which seem odd to me).
>
> An example of what I’m doing is here: http://www.cs.umd.edu/~
> cbuntain/FindTopicK-pyspark-regex.html
>
> Thanks very much in advance! If I can figure this out, I can post example
> code online, so others can see how this process is done.
>
> -Best regards,
> Cody
> _________________
> Cody Buntain, PhD
> Postdoc, @UMD_CS
> Intelligence Community Postdoctoral Fellow
> cbuntain@cs.umd.edu
> www.cs.umd.edu/~cbuntain
>
>

Mime
View raw message