lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: MoreLikeThis supporting multiple document IDs as input?
Date Thu, 03 Jan 2013 13:45:54 GMT
The MLT search component is enabled using &mlt=true and works on any normal 
Solr query. It gives a batch of similar documents for each search result of 
the original query, one batch per original query result. It uses the 
&mlt.count=n parameter to control how many similar results to return for 
each original query result.

The MLT request handler is a standalone request handler that does a query, 
takes the first result, and then returns one batch of documents that are 
similar to that one document. You have to configure the handler yourself, 
but typically it would have the name "/mlt", so you would write:

http://10.0.0.1:8080/solr/mlt/?q=shoes&rows=3

It will show you both the single document from the original query and then 
the batch of documents that are most similar to the top terms from that one 
original document.

Add &debugQuery=true or &debug=query or &debug=results to see the terms that 
are used in the secondary queries that find the similar documents.

There are a bunch a parameters that you have to tune for either approach.

-- Jack Krupansky

-----Original Message----- 
From: David Parks
Sent: Thursday, January 03, 2013 4:11 AM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

I'm not seeing the results I would expect. In the previous email below it's
stated that the "MLT search component" returns N results and K similar
documents per EACH of the N results.

If I'm not mistaken I access the "MLT search component" via a query to
/solr/select/?qt=mlt, such as this:

http://10.0.0.1:8080/solr/select/?qt=mlt&terms=true&q=shoes&rows=3

The query above for a simple term such as "shoes" can return many documents.
But I limited the results to 3, and I see 3 results, and the results don't
appear to me any different than doing this query:

http://107.23.102.164:8080/solr/select/?q=shoes&rows=3

So that suggests to me that solr maybe isn't handing things off to the MLT
component as expected (I don't know what results to expect so it's hard for
me to know where I'm trying to get to).

So add in a debugQuery=on parameter and I see this, possibly useful
reference:

<str name="QParser">LuceneQParser</str>

It also appears that the MoreLikeThisComponent did indeed run

<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">

So maybe I should ask exactly what results I should be expecting here?

Thanks very much!
David


-----Original Message-----
From: Jack Krupansky [mailto:jack@basetechnology.com]
Sent: Friday, December 28, 2012 8:13 PM
To: solr-user@lucene.apache.org
Subject: Re: MoreLikeThis supporting multiple document IDs as input?

Try a query that returns multiple results and you will see the difference.

MLT search component: n results, k similar documents per EACH of the n
results

MLT request handler: only FIRST result is examined, so only k similar
documents for that ONE (first) TOP search result.

Are you really saying that you don't comprehend what the difference is, or
simply that you don't LIKE the difference?! Or, maybe that you are wondering
WHY they are different? That latter question I don't have the answer to.

-- Jack Krupansky

-----Original Message-----
From: David Parks
Sent: Friday, December 28, 2012 2:48 AM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

So the Search Components are executed in series an _every_ request. I
presume then that they look at the request parameters and decide what and
whether to take action.

So in the case of the MLT component this was said:

> The MLT search component returns similar documents for each of the
> documents in the search results, but processes each search result base
> document one at a time and keeps its similar documents segregated by
> each of the base documents.

So what I think I understand is that the Query Component (presumably this
guy: org.apache.solr.handler.component.QueryComponent) takes the input from
the "q" parameter and returns a result (the "q=id:123456" ensure that the
Query Component will return just this one document).

The MltComponent then looks at the result from the QueryComponent and
generates its results.

The part that is still confusing is understanding the difference between
these two comments:

- The MLT search component returns similar documents for each of the
documents in the search results
- The MLT handler returns similar documents only for the first document that
the query matches.



-----Original Message-----
From: Otis Gospodnetic [mailto:otis.gospodnetic@gmail.com]
Sent: Friday, December 28, 2012 1:26 PM
To: solr-user@lucene.apache.org
Subject: RE: MoreLikeThis supporting multiple document IDs as input?

Hi Dave,

Think of search components as a chain of Java classes that get executed
during each search request. If you open solrconfig.xml you will see how they
are defined and used.

HTH

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Dec 28, 2012 12:06 AM, "David Parks" <davidparks21@yahoo.com> wrote:

> I'm somewhat new to Solr (it's running, I've been through the books,
> but I'm no master). What I hear you say is that MLT *can* accept, say
> 5, documents and provide results, but the results would essentially be
> the same as running the query 5 times for each document?
>
> If that's the case, I might accept it. I would just have to merge them
> together at the end (perhaps I'd take the top 2 of each result, for
> example).
>
> Being somewhat new I'm a little confused by the difference between a
> "Search Component" and a "Handler". I've got the /mlt handler working
> and I'm using that. But how's that different from a "Search
> Component"? Is that referring to the default /solr/select?q="..."
> style query?
>
> And if what I said about multiple documents above is correct, what's
> the syntax to try that out?
>
> Thanks very much for the great help!
> Dave
>
>
> -----Original Message-----
> From: Jack Krupansky [mailto:jack@basetechnology.com]
> Sent: Wednesday, December 26, 2012 12:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: MoreLikeThis supporting multiple document IDs as input?
>
> MLT has both a request handler and a search component.
>
> The MLT handler returns similar documents only for the first document
> that the query matches.
>
> The MLT search component returns similar documents for each of the
> documents in the search results, but processes each search result base
> document one at a time and keeps its similar documents segregated by
> each of the base documents.
>
> It sounds like you wanted to merge the base search results and then
> find documents similar to that merged super-document. Is that what you
> were really seeking, as opposed to what the MLT component does?
> Unfortunately, you can't do that with the components as they are.
>
> You would have to manually merge the values from the base documents
> and then you could POST that text back to the MLT handler and find
> similar documents using the posted text rather than a query. Kind of
> messy, but in theory that should work.
>
> -- Jack Krupansky
>
> -----Original Message-----
> From: David Parks
> Sent: Tuesday, December 25, 2012 5:04 AM
> To: solr-user@lucene.apache.org
> Subject: MoreLikeThis supporting multiple document IDs as input?
>
> I'm unclear on this point from the documentation. Is it possible to
> give Solr X # of document IDs and tell it that I want documents
> similar to those X documents?
>
> Example:
>
>   - The user is browsing 5 different articles
>   - I send Solr the IDs of these 5 articles so I can present the user
> other similar articles
>
> I see this example for sending it 1 document ID:
> http://localhost:8080/solr/select/?qt=mlt&q=id:[document
> id]&mlt.fl=[field1],[field2],[field3]&fl=id&rows=10
>
> But can I send it 2+ document IDs as the query?
>
> 

Mime
View raw message