Our LDA implementation is not completely integrated with the other clustering applications.
In particular, it does not support the clustering (classification) step which you require.
Most of the other clustering apps can do this and produce output that is similar to the gamma
matrix you noted. It is not in Mahout DRM format  this could be done  but is a sequence
file [key=topicId; value ={WeightedVectorWritable}]. In this output, the weight of the WVW
is the pdf() that the vector belongs to the topic. For maximumlikelihood clustering, this
value is 1 but for Dirichlet and FuzzyK it is a double.
If you want to offer a patch for LDA that adds this classification step it would be most appreciated.
Original Message
From: Bae, Jae Hyeon [mailto:metacret@gmail.com]
Sent: Sunday, March 06, 2011 10:35 AM
To: user@mahout.apache.org
Subject: Question about LDA output file
Hi
A few days ago, I asked about how to recover document IDs from LDA topic
clusters, but nobody answered. :(
While I was studying about LDA, some implementation of LDA can output .gamma
file regarding which documents are mainly arguing about specific topics.
I've quoted explanation about .gamma file as the following:
".gamma file: This file includes document × topic matrix. It contains
variational posterior Dirichlets distributions. This file can be used to
find
documentbased results, such as finding main topics of a document, or
finding
the top documents that are most related to a specific topic. In the
experiments
of this thesis, .gamma file is used to find out the distribution of
documents to
topics."
Unfortunately, the author didn't mention about its implementation.
Is there any way to generate output similar to .gamma file described above?
Actually, to find out relationship between Kmeans clustering and LDA, I
applied LDA with the number of topics as 1 to clustered document set
generated by Kmeans clustering, and drew containment diagram between
Kmeans clustering LDA topical word list and topical words list generated by
LDA applied to whole corpus. But if we can get .gamma file, it would make
Mahout LDA much stronger.
Best, Jay
