lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: synonym payload boosting
Date Mon, 09 Nov 2009 12:10:01 GMT

On Nov 9, 2009, at 4:41 AM, David Ginzburg wrote:

> I have found this
> https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> patch
> But i don't want to use any function, just the normal scoring and the
> similarity class  I have written.
> Can you point me to  modifications I need (if any) ?
>
>

Amhet's point is that you need some query that will actually invoke  
the payload in scoring.  PayloadTermQuery and PayloadNearQuery are the  
two that do this in Lucene.  You can certainly write your own, as well.

-Grant

>
> On Sun, Nov 8, 2009 at 16:33, AHMET ARSLAN <iorixxx@yahoo.com> wrote:
>
>> Additionaly you need to modify your queryparser to return
>> BoostingTermQuery, PayloadTermQuery, PayloadNearQuery etc.
>>
>> With these types of Queries scorePayload method invoked.
>>
>> Hope this helps.
>>
>> --- On Sun, 11/8/09, David Ginzburg <david@digitaltrowel.com> wrote:
>>
>>> From: David Ginzburg <david@digitaltrowel.com>
>>> Subject: synonym payload boosting
>>> To: solr-user@lucene.apache.org
>>> Date: Sunday, November 8, 2009, 4:06 PM
>>> Hi,
>>> I have a field and a wighted synonym map.
>>> I have indexed the synonyms with the weight as payload.
>>> my code snippet from my filter
>>>
>>> *public Token next(final Token reusableToken) throws
>>> IOException *
>>> *        . *
>>> *        . *
>>> *        .*
>>>       * Payload boostPayload;*
>>> *
>>> *
>>> *        for (Synonym synonym : syns)
>>> {*
>>> *            *
>>> *            Token newTok =
>>> new Token(nToken.startOffset(),
>>> nToken.endOffset(), "SYNONYM");*
>>> *
>>> newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
>>> synonym.getToken().length());*
>>> *            // set the
>>> position increment to zero*
>>> *            // this tells
>>> lucene the synonym is*
>>> *            // in the exact
>>> same location as the originating word*
>>> *
>>> newTok.setPositionIncrement(0);*
>>> *            boostPayload =
>>> new
>>> Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
>>> *
>>> newTok.setPayload(boostPayload);*
>>> *
>>> *
>>> I have put it in the index time analyzer : this is my field
>>> definition:
>>>
>>> *
>>> <fieldType name="PersonName" class="solr.TextField"
>>> positionIncrementGap="100" >
>>>      <analyzer type="index">
>>>        <tokenizer
>>> class="solr.WhitespaceTokenizerFactory"/>
>>>        <filter
>>> class="solr.StopFilterFactory" ignoreCase="true"
>>> words="stopwords.txt"/>
>>>        <filter
>>> class="solr.LowerCaseFilterFactory"/>
>>>        <filter
>>> class="com.digitaltrowel.solr.DTSynonymFactory"
>>> FreskoFunction="names_with_scoresPipe23Columns.txt"
>>> ignoreCase="true"
>>> expand="false"/>
>>>
>>>        <!--<filter
>>> class="solr.EnglishPorterFilterFactory"
>>> protected="protwords.txt"/>-->
>>>        <!--<filter
>>> class="solr.RemoveDuplicatesTokenFilterFactory"/>-->
>>>      </analyzer>
>>>      <analyzer type="query">
>>>        <tokenizer
>>> class="solr.WhitespaceTokenizerFactory"/>
>>>        <filter
>>> class="solr.LowerCaseFilterFactory"/>
>>>        <!--<filter
>>> class="com.digitaltrowel.solr.DTSynonymFactory"
>>> synonyms="synonyms.txt" ignoreCase="true"
>>> expand="false"/>-->
>>>        <filter
>>> class="solr.StopFilterFactory" ignoreCase="true"
>>> words="stopwords.txt"/>
>>>        <!--<filter
>>> class="solr.EnglishPorterFilterFactory"
>>> protected="protwords.txt"/>-->
>>>        <!--<filter
>>> class="solr.RemoveDuplicatesTokenFilterFactory"/
>>>> -->
>>>      </analyzer>
>>>    </fieldType>
>>>
>>>
>>> my similarity class is
>>> public class BoostingSymilarity extends DefaultSimilarity
>>> {
>>>
>>>
>>>    public BoostingSymilarity(){
>>>        super();
>>>
>>>  }
>>>    @Override
>>>    public  float scorePayload(String field,
>>> byte [] payload, int offset,
>>> int length)
>>> {
>>> double weight = PayloadHelper.decodeFloat(payload, 0);
>>> return (float)weight;
>>> }
>>>
>>> @Override public float coord(int overlap, int maxoverlap)
>>> {
>>> return 1.0f;
>>> }
>>>
>>> @Override public float idf(int docFreq, int numDocs)
>>> {
>>> return 1.0f;
>>> }
>>>
>>> @Override public float lengthNorm(String fieldName, int
>>> numTerms)
>>> {
>>> return 1.0f;
>>> }
>>>
>>> @Override public float tf(float freq)
>>> {
>>> return 1.0f;
>>> }
>>> }
>>>
>>> My problem is that scorePayload method does not get called
>>> at search time
>>> like the other methods in  my similarity class.
>>> I tested and verified it with break points.
>>> What am I doing wrong?
>>> I used solr 1.3 and thinking of the payload boos support in
>>> solr 1.4.
>>>
>>>
>>> *
>>>
>>
>> __________________________________________________
>> Do You Yahoo!?
>> Tired of spam?  Yahoo! Mail has the best spam protection around
>> http://mail.yahoo.com
>>
>
>
>
> -- 
> Regards
>
> _____________________
> David Ginzburg
> Developer, Digital Trowel
> 1 Hayarden St., Airport City
> [POB 169, NATBAG]
> Lod, 70151, Israel
> http://www.digitaltrowel.com/
> Office: +972 73 240 522
> Mobile: +972 50 496 0595
>
> CHECK OUT OUR NEW TEXT MINING BLOG:
> http://mineyourbusiness.wordpress.com/

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message