lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: synonym payload boosting
Date Tue, 10 Nov 2009 02:11:49 GMT
David, when you get this working would you consider writing a case
study on the wiki? Nothing complex, just something that describes how
you did several customizations to create a new feature.

On Mon, Nov 9, 2009 at 4:10 AM, Grant Ingersoll <gsingers@apache.org> wrote:
>
> On Nov 9, 2009, at 4:41 AM, David Ginzburg wrote:
>
>> I have found this
>>
>> https://issues.apache.org/jira/browse/SOLR-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> patch
>> But i don't want to use any function, just the normal scoring and the
>> similarity class  I have written.
>> Can you point me to  modifications I need (if any) ?
>>
>>
>
> Amhet's point is that you need some query that will actually invoke the
> payload in scoring.  PayloadTermQuery and PayloadNearQuery are the two that
> do this in Lucene.  You can certainly write your own, as well.
>
> -Grant
>
>>
>> On Sun, Nov 8, 2009 at 16:33, AHMET ARSLAN <iorixxx@yahoo.com> wrote:
>>
>>> Additionaly you need to modify your queryparser to return
>>> BoostingTermQuery, PayloadTermQuery, PayloadNearQuery etc.
>>>
>>> With these types of Queries scorePayload method invoked.
>>>
>>> Hope this helps.
>>>
>>> --- On Sun, 11/8/09, David Ginzburg <david@digitaltrowel.com> wrote:
>>>
>>>> From: David Ginzburg <david@digitaltrowel.com>
>>>> Subject: synonym payload boosting
>>>> To: solr-user@lucene.apache.org
>>>> Date: Sunday, November 8, 2009, 4:06 PM
>>>> Hi,
>>>> I have a field and a wighted synonym map.
>>>> I have indexed the synonyms with the weight as payload.
>>>> my code snippet from my filter
>>>>
>>>> *public Token next(final Token reusableToken) throws
>>>> IOException *
>>>> *        . *
>>>> *        . *
>>>> *        .*
>>>>      * Payload boostPayload;*
>>>> *
>>>> *
>>>> *        for (Synonym synonym : syns)
>>>> {*
>>>> *            *
>>>> *            Token newTok =
>>>> new Token(nToken.startOffset(),
>>>> nToken.endOffset(), "SYNONYM");*
>>>> *
>>>> newTok.setTermBuffer(synonym.getToken().toCharArray(), 0,
>>>> synonym.getToken().length());*
>>>> *            // set the
>>>> position increment to zero*
>>>> *            // this tells
>>>> lucene the synonym is*
>>>> *            // in the exact
>>>> same location as the originating word*
>>>> *
>>>> newTok.setPositionIncrement(0);*
>>>> *            boostPayload =
>>>> new
>>>> Payload(PayloadHelper.encodeFloat(synonym.getWieght()));*
>>>> *
>>>> newTok.setPayload(boostPayload);*
>>>> *
>>>> *
>>>> I have put it in the index time analyzer : this is my field
>>>> definition:
>>>>
>>>> *
>>>> <fieldType name="PersonName" class="solr.TextField"
>>>> positionIncrementGap="100" >
>>>>     <analyzer type="index">
>>>>       <tokenizer
>>>> class="solr.WhitespaceTokenizerFactory"/>
>>>>       <filter
>>>> class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt"/>
>>>>       <filter
>>>> class="solr.LowerCaseFilterFactory"/>
>>>>       <filter
>>>> class="com.digitaltrowel.solr.DTSynonymFactory"
>>>> FreskoFunction="names_with_scoresPipe23Columns.txt"
>>>> ignoreCase="true"
>>>> expand="false"/>
>>>>
>>>>       <!--<filter
>>>> class="solr.EnglishPorterFilterFactory"
>>>> protected="protwords.txt"/>-->
>>>>       <!--<filter
>>>> class="solr.RemoveDuplicatesTokenFilterFactory"/>-->
>>>>     </analyzer>
>>>>     <analyzer type="query">
>>>>       <tokenizer
>>>> class="solr.WhitespaceTokenizerFactory"/>
>>>>       <filter
>>>> class="solr.LowerCaseFilterFactory"/>
>>>>       <!--<filter
>>>> class="com.digitaltrowel.solr.DTSynonymFactory"
>>>> synonyms="synonyms.txt" ignoreCase="true"
>>>> expand="false"/>-->
>>>>       <filter
>>>> class="solr.StopFilterFactory" ignoreCase="true"
>>>> words="stopwords.txt"/>
>>>>       <!--<filter
>>>> class="solr.EnglishPorterFilterFactory"
>>>> protected="protwords.txt"/>-->
>>>>       <!--<filter
>>>> class="solr.RemoveDuplicatesTokenFilterFactory"/
>>>>>
>>>>> -->
>>>>
>>>>     </analyzer>
>>>>   </fieldType>
>>>>
>>>>
>>>> my similarity class is
>>>> public class BoostingSymilarity extends DefaultSimilarity
>>>> {
>>>>
>>>>
>>>>   public BoostingSymilarity(){
>>>>       super();
>>>>
>>>>  }
>>>>   @Override
>>>>   public  float scorePayload(String field,
>>>> byte [] payload, int offset,
>>>> int length)
>>>> {
>>>> double weight = PayloadHelper.decodeFloat(payload, 0);
>>>> return (float)weight;
>>>> }
>>>>
>>>> @Override public float coord(int overlap, int maxoverlap)
>>>> {
>>>> return 1.0f;
>>>> }
>>>>
>>>> @Override public float idf(int docFreq, int numDocs)
>>>> {
>>>> return 1.0f;
>>>> }
>>>>
>>>> @Override public float lengthNorm(String fieldName, int
>>>> numTerms)
>>>> {
>>>> return 1.0f;
>>>> }
>>>>
>>>> @Override public float tf(float freq)
>>>> {
>>>> return 1.0f;
>>>> }
>>>> }
>>>>
>>>> My problem is that scorePayload method does not get called
>>>> at search time
>>>> like the other methods in  my similarity class.
>>>> I tested and verified it with break points.
>>>> What am I doing wrong?
>>>> I used solr 1.3 and thinking of the payload boos support in
>>>> solr 1.4.
>>>>
>>>>
>>>> *
>>>>
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>> http://mail.yahoo.com
>>>
>>
>>
>>
>> --
>> Regards
>>
>> _____________________
>> David Ginzburg
>> Developer, Digital Trowel
>> 1 Hayarden St., Airport City
>> [POB 169, NATBAG]
>> Lod, 70151, Israel
>> http://www.digitaltrowel.com/
>> Office: +972 73 240 522
>> Mobile: +972 50 496 0595
>>
>> CHECK OUT OUR NEW TEXT MINING BLOG:
>> http://mineyourbusiness.wordpress.com/
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message