mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: LDA from Lucene Indexes
Date Tue, 03 May 2011 16:22:15 GMT
Hi Chris,

  That's what I thought.  This line needs to make sure you store termvectors
(see this article<http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/>for
more details):

On Tue, May 3, 2011 at 8:32 AM, Chris McConnell
<c.t.mcconnell.ge@gmail.com>wrote:
>
> if (elementName.equals("doc")) {
>                if(title && content){
>                                doc.add(new
> Field("title",titleStr,Field.Store.YES,Field.Index.ANALYZED));
>                                doc.add(new
> Field("content",contentStr,Field.Store.YES,Field.Index.ANALYZED));


You want this to be:

new Field("content", contentStr, Field.Store.YES, Field.Index.ANALYZED,
Field.TermVector.YES);

Although technically, we could add the capability to take a Store.YES field
and re-tokenize and
build vectors from this as well.

  -jake

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message