mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Interpreting the results of LDA CVB
Date Thu, 31 Jan 2013 14:37:02 GMT
Hi Thilina,

  The flag you missed on your vectordump commandline is the "--sort"
option, which sorts the results before taking the top k.  Try that and send
us what that looks like?  It should be much easier to interpret.


On Mon, Jan 7, 2013 at 7:19 AM, Thilina Gunarathne <csethil@gmail.com>wrote:

> Dear All,
> I'm trying to run the Mahout LDA (cvb version) on a subset of the 20news
> data set, as a sample for an Hadoop publications we are working on.  I need
> some help in understanding the Maout output to figure out the topics.
>
> I ran the following commands on the TF vectors generated using seq2sparse
> command.
> >bin/mahout rowid -i 20news-tf/tf-vectors -o 20news-tf-int
> >bin/mahout cvb -i 20news-tf-int/matrix -o lda-out -k 10  -x 20  -dict
> 20news-tf/dictionary.file-0 -dt lda-topics -mt lda-topic-model
>
> After that I dumped the results using the vectordump as follows.
>
> >bin/mahout vectordump -i lda-topics/part-m-00000 --dictionary
> 20news-tf/dictionary.file-0 --vectorSize 10  -dt sequencefile
> ......
>
>
> {"Fluxgate:0.12492744375758073,&:0.03875953927132082,(140.220.1.1):0.1228639250669511,(Babak:0.15074522974495433,(Bill:0.10512715697420276,(Gerrit:0.10130565323653766,(Michael:0.061169131590630275,(Scott:0.14501579630233746,(Usenet:0.07872957132697946,(continued):0.07135655272850545}
>
> {"Fluxgate:0.13130952097888746,&:0.05207587369196414,(140.220.1.1):0.12533225607394424,(Babak:0.08607740024552457,(Bill:0.20218284543514245,(Gerrit:0.07318295757631627,(Michael:0.08766888242201039,(Scott:0.08858421220476514,(Usenet:0.09201906604666685,(continued):0.06156698532477829}
> .......
>
> It would be great if someone can help me to interpret the above results.
> The probability values seems to be more or less similar in all the cases.
> Is it due to the smaller size of the dataset?
>
> thanks,
> Thilina
>
> --
> https://www.cs.indiana.edu/~tgunarat/
> http://www.linkedin.com/in/thilina
> http://thilina.gunarathne.org
>



-- 

  -jake

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message