lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Keegan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-5831) Scale score PostFilter
Date Thu, 13 Mar 2014 14:56:48 GMT

    [ https://issues.apache.org/jira/browse/SOLR-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13933339#comment-13933339
] 

Peter Keegan commented on SOLR-5831:
------------------------------------

Here is a comparison of ScaleScorePostFilter and a Configurable Collector that does the same
score scaling with the function 'hard wired' in the collector. I had to hand merge some of
the patch from SOLR-4465 into the 4.6.1 branch. My tests show that the Collector is faster:

1. SolrMeter test @ 20 QPS:

Custom Collector with maxscalehits=10000:
Median response time: 25 ms
Ave response time: 135 ms
Load average: 2.3

PostFilter with maxscalehits=10000:
Median response time: 30 ms
Ave response time: 190 ms
Load average: 3.3	

2. Typical response times as a function of hit count:

# hits		Collector	 PostFilter	
------		        ---------	----------
80K		        12		35
230K		20		80
330K		25		123
720K		35		275
1.1M		        32		390

These difference in the response times is likely due to the hits being collected twice by
the PostFilter 
(once by the PostFilter and once by the delegate collector), but the Custom Collector only
collects the 
hits once. 

> Scale score PostFilter
> ----------------------
>
>                 Key: SOLR-5831
>                 URL: https://issues.apache.org/jira/browse/SOLR-5831
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 4.7
>            Reporter: Peter Keegan
>            Priority: Minor
>         Attachments: SOLR-5831.patch
>
>
> The ScaleScoreQParserPlugin is a PostFilter that performs score scaling.
> This is an alternative to using a function query wrapping a scale() wrapping a query().
For example:
> select?qq={!edismax v='news' qf='title^2 body'}&scaledQ=scale(product(query($qq),1),0,1)&q={!func}sum(product(0.75,$scaledQ),product(0.25,field(myfield)))&fq={!query
v=$qq}
> The problem with this query is that it has to scale every hit. Usually, only the returned
hits need to be scaled,
> but there may be use cases where the number of hits to be scaled is greater than the
returned hit count,
> but less than or equal to the total hit count.
> Sample syntax:
> fq={!scalescore+l=0.0 u=1.0 maxscalehits=10000 func=sum(product(sscore(),0.75),product(field(myfield),0.25))}
> l=0.0 u=1.0 		//Scale scores to values between 0-1, inclusive 
> maxscalehits=10000 	//The maximum number of result scores to scale (-1 = all hits, 0
= results 'page' size)
> func=... 			//Apply the composite function to each hit. The scaled score value is accessed
by the 'score()' value source
> All parameters are optional. The defaults are:
> l=0.0 u=1.0
> maxscalehits=0 (result window size)
> func=(null)
>  
> Note: this patch is not complete, as it contains no test cases and may not conform 
> to all the guidelines in http://wiki.apache.org/solr/HowToContribute. 
>  
> I would appreciate any feedback on the usability and implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message