lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From escher2k <esche...@yahoo.com>
Subject Re: Question about similarity manipulation...
Date Wed, 03 Jan 2007 20:51:38 GMT


Chris Hostetter wrote:
> 
> 
> : The DisjunctionMaxQuery seems to yield the maximum score only. From my
> 
> NOTE: by setting the "tiebreaker" value of a DisjunctionMaxQuery to "1.0"
> it generates the sum of the scores
> 
> : understanding, I would
> : need to do the following -
> : (1) Create a new similarity function
> : (2) Write a new Query class extension
> : (3) Need to write a new linear function ??
> 
> you'll definitely need a new similarity class with a custom tf and
> queryNorm function.  I don't think you'd need a new QUewry class .. what
> you are looking for should be fairly straight forward to impliment using
> BooleanQueries, TermQueries, and FunctionQueries.  You shouldn't need to
> write a new linear function ValueSource -- i can't think of why the
> current one wouldn't work for you.
> 
> the java-user@lucene list is a good place to ask general questions about
> customizing Scoring by writting your own Similarity, and it has a larger
> user base then the solr lists.
> 
> 
> 
> -Hoss
> 
> 
> 

Thanks Hoss. I have written the new similarity class. There are two problems
with the existing
linear function -
(a) the input doesn't seem to be the score returned for the field by doing
the
similarity computation, but instead depends on the field data type. 
(b) Also, the function I want is a slight variation of the linear function.
Essentially it is a step function, if term freq = 1, return a particular
value and if term freq > 1, implement a linear function.

But I think (a) is the bigger problem.

For instance on this data set -
- <doc>
  <str name="desc">ABCDE XYZ</str> 
  <str name="id">40</str> 
  <str name="name">abcde XYZ GHI</str> 
  <float name="profile_score">55</float> 
  </doc>
- <doc>
  <str name="desc">ABCDE ABCDE XYZ</str> 
  <str name="id">30</str> 
  <str name="name">ABCDE XYZ GHI</str> 
  <float name="profile_score">45</float> 
  </doc>

the following URL returns data -
http://dev01:8983/solr/select/?qt=dismax&q=abcde+ghi&qf=name&bf=linear(id,1,10)&debugQuery=1

whereas 
http://dev01:8983/solr/select/?qt=dismax&q=abcde+ghi&qf=name&bf=linear(name,1,10)&debugQuery=1
throws a null pointer exception -
java.lang.RuntimeException: there are more terms than documents in field
"name", but it's impossible to sort on tokenized fields
	at
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:274)
	at
org.apache.solr.search.function.OrdFieldSource.getValues(OrdFieldSource.java:55)
	at
org.apache.solr.search.function.LinearFloatFunction.getValues(LinearFloatFunction.java:49)
	at
org.apache.solr.search.function.FunctionQuery$AllScorer.<init>(FunctionQuery.java:100)

Once again, thanks for your help.
-- 
View this message in context: http://www.nabble.com/Question-about-similarity-manipulation...-tf2910171.html#a8148415
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message