lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From baris.ka...@oracle.com
Subject Re: Index-time boosting: Deprecated setBoost method
Date Mon, 21 Oct 2019 15:05:36 GMT
Hi,-

  i would like to ask the following to make it clearer (for me at least):

Document doc = new Document();



Field  f1= new TextField("field1", "string1", Field.Store.YES);


doc.add(f1); 
f1.setBoost(2.0f);



Field f2 = new TextField("field2", "string2", Field.Store.YES);


doc.add(f2);


f2.setBoost(1.0f);




This turns into this where _boost1 field is associated with field1 and

_boost2 field is associated with field2 field:


In Indexing code:

Field  f1= new TextField("field1", "string1", Field.Store.YES);


Field _boost1 = new NumericDocValuesField(“field1”, 2L);
doc.add(_boost1);

// If this boost value needs to be stored, a separate storedField 
instance needs to be added as well
… ( i will post this soon)

Field _boost2 = new NumericDocValuesField(“field2”, 1L);
doc.add(_boost2);

// If this boost value needs to be stored, a separate storedField 
instance needs to be added as well
… ( i will post this soon)


Now, in the searching code (i.e., at query time) should i need the 
FunctionScoreQuery because in this case

the boost is just a constant value but not a function? However, constant 
value can be argued to be a function with the same value all the time, too.


Expression expr = JavascriptCompiler.compile(“_boost");



// SimpleBindings just maps variables to SortField instances


SimpleBindings bindings = new SimpleBindings();


bindings.add(new SortField("_boost1", SortField.Type.SCORE));
 


// create a query that matches based on body:contents but


// scores using expr


Query query = new FunctionScoreQuery(


     new TermQuery(new Term("field1", "term_to_look_for")),


expr.getDoubleValuesSource(bindings));


searcher.search(query, 10);


So, if boost is a single constant value, do we need the Javascript part 
above?

Best regards


On 10/18/19 4:07 PM, baris.kazar@oracle.com wrote:
> Uwe,-
>
>  can this 
> https://lucene.apache.org/core/7_7_2/expressions/org/apache/lucene/expressions/Expression.html

> doc example that You also gave be extended with NumericDocValuesField 
> part that needs to be done at indexing time boosting, too?
>
> i see now why You meant that this is mixed type of boosting (i.e., 
> both indexing time and search time).
>
> I need then include this query mentioned in this example on these 
> _score field (i would call it _boost field in my case) into my overall 
> BooleanQuery.
>
> i will now try to combine these together and post here for future help.
>
> Best regards
>
>
> On 10/18/19 3:18 PM, Uwe Schindler wrote:
>> Hi,
>>
>> Read my original email! The index time values are written using 
>> NumericDocValuesField. The expressions docs also refer to that when 
>> the bindings are documented.
>>
>> It's separate from the indexed data (TextField). Think of it like an 
>> additional numeric field in your database table with a factor in each 
>> row.
>>
>> Uwe
>>
>> Am October 18, 2019 7:14:03 PM UTC schrieb baris.kazar@oracle.com:
>>> Uwe,-
>>>
>>> Two questions there:
>>>
>>> i guess this is applicable to TextField, too.
>>>
>>> And i was expecting a index writer object in the example for index time
>>>
>>> boosting.
>>>
>>> Best regards
>>>
>>>
>>> On 10/18/19 2:57 PM, Uwe Schindler wrote:
>>>> Sorry I was imprecise. It's a mix of both. The factors are stored per
>>> document in index (this is why I called it index time). During query
>>> time the expression use the index time values to fold them into the
>>> query boost at query time.
>>>> What's your problem with that approach?
>>>>
>>>> Uwe
>>>>
>>>> Am October 18, 2019 6:50:40 PM UTC schrieb baris.kazar@oracle.com:
>>>>> Uwe,-
>>>>>
>>>>>    Thanks, if possible i am looking for a pure Java methodology to
do
>>> the
>>>>> index time boosting.
>>>>>
>>>>> This example looks like a search time boosting example:
>>>>>
>>>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=

>>>
>>>>>
>>>>>
>>>>> Best regards
>>>>>
>>>>> On 10/18/19 2:31 PM, Uwe Schindler wrote:
>>>>>> Hi,
>>>>>>
>>>>>>> Is there a working example for this? Is this mentioned in the
>>> Lucene
>>>>>>> Javadocs or any other docs so that i can look it?
>>>>>> To index the docvalues, see NumericDocValuesField (it can be added
>>> to
>>>>> documents like indexed or stored fields). You may have used them for
>>>>> sorting already.
>>>>>>> this methodology seems sort of like discouraging using index
time
>>>>> boosting.
>>>>>> Not really. Many use this all the time. It's one of the killer
>>>>> features of both Solr and Elasticsearch. The problem was how the
>>>>> Document.setBoost()worked (it did not work correctly, see below).
>>>>>>> Previous setBoost method call was fine and easy to use.
>>>>>>> Did it have some performance issues and then is that why it was
>>>>> deprecated?
>>>>>> No the reason for deprecating this was for several reasons:
>>> setBoost
>>>>> was not doing what the user had expected. Internally the boost value
>>>>> was just multiplied into the document norm factor (which is
>>> internally
>>>>> also a docvalues field). The norm factors are only very inprecise
>>>>> floats stored in a byte, so precision is not well. If you put some
>>>>> values into it and the length norm was already consuming all bits,
>>> the
>>>>> boosting was very coarse. It was also only multiplied into and most
>>>>> users want to do some stuff like record click counts in the index
>>> and
>>>>> then boost for example with the logarithm or some other function. If
>>>>> the boost is just multiplied into the length norm you have no
>>>>> flexibility at all.
>>>>>> In addition you can have several docvalues fields and use their
>>>>> values in a function (e.g. one field with click count and another
>>> one
>>>>> with product price). After that you can combine click count and
>>> price
>>>>> (which can be modified indipenently during index updates) and change
>>>>> boost to boost lower price and higher click count up.
>>>>>> This is what you can do with the expressions module. You just give
>>> it
>>>>> a function.
>>>>>> Here is an example, the second example is using a
>>> FunctionScoreQuery
>>>>> that modifies the score based on the function and the given
>>> docvalues:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_7-5F7-5F2_expressions_org_apache_lucene_expressions_Expression.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6m6i5zZXPZNP6DyVv_xG4vXnVTPEdfKLeLSvGjEXbyw&s=B5_kGwRIbAoGqL0-SVR9r3t78E5XUuzLT37TeyV-bv8&e=

>>>
>>>>>>> FunctionScoreQuery usage with MultiFieldQueryParser would also
be
>>>>> nice
>>>>>>> where
>>>>>>>
>>>>>>> MultiFieldQuery already has boosts field to do this in its
>>>>> constructor.
>>>>>> The boots in the query parser are applied for fields during query
>>>>> time (to have a different weight per field). Index time boosting is
>>> per
>>>>> document. So you can combine both.
>>>>>>> Maybe it is not needed with MultiFieldQueryParser.
>>>>>> You use MultiFieldQueryParser to adjust weights of the fields (e.g.
>>>>> title versus body). The parsed query is then wrapped with an
>>> expression
>>>>> that modifies the score per document according to the docvalues.
>>>>>> Uwe
>>>>>>
>>>>>>> On 10/18/19 1:28 PM, Uwe Schindler wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> that's not true. You can do index time boosting, but you
need to
>>> do
>>>>> that
>>>>>>> using a separate field. You just index a numeric docvalues field
>>>>> (which may
>>>>>>> contain a long or float value per document). Later you wrap your
>>>>> query with
>>>>>>> some FunctionScoreQuery (e.g., use the Javascript function query
>>>>> syntax in
>>>>>>> the expressions module). This allows you to compile a javascript
>>>>> function
>>>>>>> that calculated the final score based on the score returned by
the
>>>>> inner query
>>>>>>> and combines them with docvalues that were indexed per document.
>>>>>>>> Uwe
>>>>>>>>
>>>>>>>> -----
>>>>>>>> Uwe Schindler
>>>>>>>> Achterdiek 19, D-28357 Bremen
>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>> 3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIr
>>>>>>> MUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>>>> 8W80yE9L5xY&s=zgKmnmP9gLG4DlEnAfDdtBMEzPXtHNVYojxXIKEnQgs&e=
>>>>>>>> eMail: uwe@thetaphi.de
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: baris.kazar@oracle.com <baris.kazar@oracle.com>
>>>>>>>>> Sent: Friday, October 18, 2019 5:28 PM
>>>>>>>>> To: java-user@lucene.apache.org
>>>>>>>>> Cc: baris.kazar@oracle.com
>>>>>>>>> Subject: Re: Index-time boosting: Deprecated setBoost
method
>>>>>>>>>
>>>>>>>>> It looks like index-time boosting (field) is not possible
since
>>>>> Lucene
>>>>>>>>> version 7.7.2 and
>>>>>>>>>
>>>>>>>>> i was using before for another case the BoostQuery at
search
>>> time
>>>>> for
>>>>>>>>> boosting and
>>>>>>>>>
>>>>>>>>> this seems to be the only boosting option now in Lucene.
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10/18/19 10:01 AM, baris.kazar@oracle.com wrote:
>>>>>>>>>> Hi,-
>>>>>>>>>>
>>>>>>>>>> i saw this in the Field class docs and i am figuring
out the
>>>>> following
>>>>>>>>>> note in the docs:
>>>>>>>>>>
>>>>>>>>>> setBoost(float boost)
>>>>>>>>>> Deprecated.
>>>>>>>>>> Index-time boosts are deprecated, please index index-time
>>> scoring
>>>>>>>>>> factors into a doc value field and combine them with
the score
>>> at
>>>>>>>>>> query time using eg. FunctionScoreQuery.
>>>>>>>>>>
>>>>>>>>>> I appreciate this note. Is there an example about
this? I wish
>>>>> docs
>>>>>>>>>> would give a simple example to further help.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>> 3A__lucene.apache.org_core_6-5F6-
>>>>>>> 5F0__core_org_apache_lucene_document_&d=DwIFaQ&c=RoP1YumCXCga
>>>>>>> WHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>>>> 8W80yE9L5xY&s=rIVbw3_TGEwpaet5ibCeYze6vSDUiPhwOzlV0z484fM&e=
>>>>>>>>> Field.html
>>>>>>>>>> vs
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>>>>> 3A__lucene.apache.org_core_7-5F7-
>>>>>>> 5F2_core_org_apache_lucene_document_F&d=DwIFaQ&c=RoP1YumCXCgaW
>>>>>>> HvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-
>>>>>>> BKNeyLlULCbaezrgocEvPhQkl4&m=6rVk8db2H8dAcjS3WCWmAPd08C7JQCvZ
>>>>>>> 8W80yE9L5xY&s=yt1toHHZQBqd3qKpWeSzywGJhy928Q5qaEO4v9Lj3vg&e=
>>>>>>>>> ield.html
>>>>>>>>>> Best regards
>>>>>>>>>>
>>>>>>>>>>
>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail:
>>> java-user-help@lucene.apache.org
>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>> -- 
>>>> Uwe Schindler
>>>> Achterdiek 19, 28357 Bremen
>>>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=6ksT9ArMj83Yxf_GrxLNeJ4UFEeKdVdLK0BlOT0d754&s=33f2nq9rOLI5pN9e_RYl_TiEKnP_f4WMZ__vqyz2bzo&e=

>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> -- 
>> Uwe Schindler
>> Achterdiek 19, 28357 Bremen
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.thetaphi.de&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=owjI40OeLzt8gvPN44aTdndoiUel5E9Hqx1TEcoWk_Y&s=xbZedNkQXb5eQcw_K7lCOP7b5ToKJVZ1dCPY3hi836c&e=

>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message