lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: diversity in results
Date Tue, 05 Aug 2008 00:33:15 GMT
Hi Jason,


Yes, TV will store additional data in the index.  Using fields with TV=true will simply get
to the seminal terms more easily.  Yes, in the end the terms are used to perform a normal
query and get the most similar docs.  This is based on my use of MLT a whiiiiiiile back, but
I don't think things changed that much in the last few years.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Jason Rennie <jrennie@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Monday, August 4, 2008 6:17:28 PM
> Subject: Re: diversity in results
> 
> Does the MLT handler simply select a few high tfidf terms from the doc and
> use them as a query?  Sounds like a useful tool.  Do you know anything about
> relevant performance issues?  I noticed that the Solr MoreLikeThis wiki page
> recommends turning on TermVectors for corresponding fields.  Can lucene not
> easily return term counts for a document with the standard indexing b/c it's
> term-based (i.e. "inverted").  Does TermVectors=true cause solr/lucene to
> store an additional doc-based index?
> 
> Thanks,
> 
> Jason
> 
> On Mon, Aug 4, 2008 at 5:06 PM, Brian Whitman wrote:
> 
> > not out of the box, but I would use the mlt handler on the first result and
> > remove all the ones that appear in both the MLT and query response.
> >
> > B
> >
> >
> -- 
> Jason Rennie
> Head of Machine Learning Technologies, StyleFeeder
> http://www.stylefeeder.com/
> Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/


Mime
View raw message