lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghuveer Kancherla <raghuveer.kanche...@aplopio.com>
Subject Re: Payloads with Phrase queries
Date Tue, 15 Dec 2009 11:31:06 GMT
The interesting thing I am noticing is that the scoring works fine for a
phrase query like "solr rocks".
This lead me to look at what query I am using in case of a single term.
Turns out that I am using PayloadTermQuery taking a cue from solr-1485
patch.

I changed this to BoostingTermQuery (i read somewhere that this is
deprecated .. but i was just experimenting) and the scoring seems to work as
expected now for a single term.

Now, the important question is what is the Payload version of a TermQuery?

Regards
Raghu


On Tue, Dec 15, 2009 at 12:45 PM, Raghuveer Kancherla <
raghuveer.kancherla@aplopio.com> wrote:

> Hi,
> Thanks everyone for the responses, I am now able to get both phrase queries
> and term queries to use payloads.
>
> However the the score value for each document (and consequently, the
> ordering of documents) are coming out wrong.
>
> In the solr output appended below, document 4 has a score higher than the
> document 2 (look at the debug part). The results section shows a wrong score
> (which is the payload value I am returning from my custom similarity class)
> and the ordering is also wrong because of this. Can someone explain this ?
>
> My custom query parser is pasted here http://pastebin.com/m9f21565
>
> In the similarity class, I return 10.0 if payload is 1 and 20.0 if payload
> is 2. For everything else I return 1.0.
>
> {
>  'responseHeader':{
>   'status':0,
>   'QTime':2,
>   'params':{
> 	'fl':'*,score',
> 	'debugQuery':'on',
> 	'indent':'on',
>
>
> 	'start':'0',
> 	'q':'solr',
> 	'qt':'aplopio',
> 	'wt':'python',
> 	'fq':'',
> 	'rows':'10'}},
>  'response':{'numFound':5,'start':0,'maxScore':20.0,'docs':[
>
>
> 	{
> 	 'payloadTest':'solr|2 rocks|1',
> 	 'id':'2',
> 	 'score':20.0},
> 	{
> 	 'payloadTest':'solr|2',
> 	 'id':'4',
> 	 'score':20.0},
>
>
> 	{
> 	 'payloadTest':'solr|1 rocks|2',
> 	 'id':'1',
> 	 'score':10.0},
> 	{
> 	 'payloadTest':'solr|1 rocks|1',
> 	 'id':'3',
> 	 'score':10.0},
>
>
> 	{
> 	 'payloadTest':'solr',
> 	 'id':'5',
> 	 'score':1.0}]
>  },
>  'debug':{
>   'rawquerystring':'solr',
>   'querystring':'solr',
>
>
>   'parsedquery':'PayloadTermQuery(payloadTest:solr)',
>   'parsedquery_toString':'payloadTest:solr',
>   'explain':{
> 	'2':'\n7.227325 = (MATCH) fieldWeight(payloadTest:solr in 1), product of:\n  14.142136
= (MATCH) btq, product of:\n    0.70710677 = tf(phraseFreq=0.5)\n    20.0 = scorePayload(...)\n
 0.81767845 = idf(payloadTest:  solr=5)\n  0.625 = fieldNorm(field=payloadTest, doc=1)\n',
>
>
> 	'4':'\n11.56372 = (MATCH) fieldWeight(payloadTest:solr in 3), product of:\n  14.142136
= (MATCH) btq, product of:\n    0.70710677 = tf(phraseFreq=0.5)\n    20.0 = scorePayload(...)\n
 0.81767845 = idf(payloadTest:  solr=5)\n  1.0 = fieldNorm(field=payloadTest, doc=3)\n',
>
>
> 	'1':'\n3.6136625 = (MATCH) fieldWeight(payloadTest:solr in 0), product of:\n  7.071068
= (MATCH) btq, product of:\n    0.70710677 = tf(phraseFreq=0.5)\n    10.0 = scorePayload(...)\n
 0.81767845 = idf(payloadTest:  solr=5)\n  0.625 = fieldNorm(field=payloadTest, doc=0)\n',
>
>
> 	'3':'\n3.6136625 = (MATCH) fieldWeight(payloadTest:solr in 2), product of:\n  7.071068
= (MATCH) btq, product of:\n    0.70710677 = tf(phraseFreq=0.5)\n    10.0 = scorePayload(...)\n
 0.81767845 = idf(payloadTest:  solr=5)\n  0.625 = fieldNorm(field=payloadTest, doc=2)\n',
>
>
> 	'5':'\n0.578186 = (MATCH) fieldWeight(payloadTest:solr in 4), product of:\n  0.70710677
= (MATCH) btq, product of:\n    0.70710677 = tf(phraseFreq=0.5)\n    1.0 = scorePayload(...)\n
 0.81767845 = idf(payloadTest:  solr=5)\n  1.0 = fieldNorm(field=payloadTest, doc=4)\n'},
>
>
>   'QParser':'BoostingTermQParser',
>   'filter_queries':[''],
>   'parsed_filter_queries':[],
>   'timing':{
> 	'time':2.0,
> 	'prepare':{
> 	 'time':1.0,
>
>
> 	 'org.apache.solr.handler.component.QueryComponent':{
> 	  'time':1.0},
> 	 'org.apache.solr.handler.component.FacetComponent':{
> 	  'time':0.0},
> 	 'org.apache.solr.handler.component.MoreLikeThisComponent':{
>
>
> 	  'time':0.0},
> 	 'org.apache.solr.handler.component.HighlightComponent':{
> 	  'time':0.0},
> 	 'org.apache.solr.handler.component.StatsComponent':{
> 	  'time':0.0},
> 	 'org.apache.solr.handler.component.DebugComponent':{
>
>
> 	  'time':0.0}},
> 	'process':{
> 	 'time':1.0,
> 	 'org.apache.solr.handler.component.QueryComponent':{
> 	  'time':0.0},
> 	 'org.apache.solr.handler.component.FacetComponent':{
>
>
> 	  'time':0.0},
> 	 'org.apache.solr.handler.component.MoreLikeThisComponent':{
> 	  'time':0.0},
> 	 'org.apache.solr.handler.component.HighlightComponent':{
> 	  'time':0.0},
>
>
> 	 'org.apache.solr.handler.component.StatsComponent':{
> 	  'time':0.0},
> 	 'org.apache.solr.handler.component.DebugComponent':{
> 	  'time':1.0}}}}}
>
>
>
>
>
>
>
>
>
>
>
>
> On Thu, Dec 10, 2009 at 5:48 PM, AHMET ARSLAN <iorixxx@yahoo.com> wrote:
>
>>
>> > I was looking through some lucene
>> > source codes and found the following class
>> > org.apache.lucene.search.payloads.PayloadSpanUtil
>> >
>> > There is a function named queryToSpanQuery in this class.
>> > Is this the
>> > preferred way to convert a PhraseQuery to
>> > PayloadNearQuery?
>>
>> queryToSpanQuery method does not return PayloadNearQuery type.
>>
>> You need to override getFieldQuery(String field, String queryText, int
>> slop) of SolrQueryParser or QueryParser.
>>
>> This code is modified from Lucene In Action Book (2nd edition) Chapter
>> 6.3.4 Allowing ordered phrase queries
>>
>> protected Query getFieldQuery(String field, String queryText, int slop)
>> throws ParseException {
>>
>>        Query orig = super.getFieldQuery(field, queryText, slop);
>>
>>        if (!(orig instanceof PhraseQuery)) return orig;
>>
>>        PhraseQuery pq = (PhraseQuery) orig;
>>        Term[] terms = pq.getTerms();
>>        SpanQuery[] clauses = new SpanQuery[terms.length];
>>
>>        for (int i = 0; i < terms.length; i++)
>>            clauses[i] = new PayloadTermQuery(terms[i], new
>> AveragePayloadFunction());
>>        return new PayloadNearQuery(clauses, slop, true);
>>
>>    }
>>
>>
>> > Also, are there any performance considerations while using
>> > a PayloadNearQuery instead of a PhraseQuery?
>>
>> I don't think there will be significant performance difference.
>>
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message