lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xavier jmlucjav <jmluc...@gmail.com>
Subject Re: custom similary on a field not working
Date Thu, 21 Mar 2013 13:44:17 GMT
Hi Felipe,

I need to keep positions, that is why I cannot just use
omitTermFreqAndPositions


On Thu, Mar 21, 2013 at 2:36 PM, Felipe Lahti <flahti@thoughtworks.com>wrote:

> Do you really need a custom similarity?
> Did you try to put the attribute "omitTermFreqAndPositions" in your field?
>
> It could be:
>
> <field name="description" omitTermFreqAndPositions="true"    type="text"
> indexed="true" stored="true"  multiValued="false" omitNorms="true" />
>
> http://wiki.apache.org/solr/SchemaXml
>
>
> On Thu, Mar 21, 2013 at 7:35 AM, xavier jmlucjav <jmlucjav@gmail.com>
> wrote:
>
> > I have the following setup:
> >
> >         <fieldType name="text" class="solr.TextField"
> > positionIncrementGap="100">
> >             <analyzer>
> >                 <tokenizer class="solr.StandardTokenizerFactory"/>
> >                 <filter class="solr.LowerCaseFilterFactory"/>
> >             </analyzer>
> >         </fieldType>
> >         <field name="description"    type="text"   indexed="true"
> > stored="true"   multiValued="false" omitNorms="true" />
> >
> > I index my corpus, and I can see tf is as usual, in this doc is 14 times
> in
> > this field:
> > 4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440)
> > [DefaultSimilarity], result of:
> >       4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
> >         0.14165252 = queryWeight, product of:
> >           10.0 = boost
> >           8.5082035 = idf(docFreq=30, maxDocs=56511)
> >           0.0016648936 = queryNorm
> >         31.834784 = fieldWeight in 440, product of:
> >           3.7416575 = tf(freq=14.0), with freq of:
> >             14.0 = termFreq=14.0
> >           8.5082035 = idf(docFreq=30, maxDocs=56511)
> >           1.0 = fieldNorm(doc=440)
> >
> >
> > Then I modify my schema:
> >
> >     <similarity class="solr.SchemaSimilarityFactory"/>
> >         <fieldType name="text" class="solr.TextField"
> > positionIncrementGap="100">
> >             <analyzer>
> >                 <tokenizer class="solr.StandardTokenizerFactory"/>
> >                 <filter class="solr.LowerCaseFilterFactory"/>
> >             </analyzer>
> >             <similarity class="com.customsolr.NoTfSimilarityFactory"/>
> >         </fieldType>
> >
> > I just want to disable term freq > 1, so a term its either present or
> not.
> >
> > public class NoTfSimilarity extends DefaultSimilarity {
> >         public float tf(float freq) {
> >                 return freq > 0 ? 1.0f : 0.0f;
> >         }
> > }
> >
> > But I still see tf=14 in my query??
> > 723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result of:
> >         723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
> >           85.08203 = queryWeight, product of:
> >             10.0 = boost
> >             8.5082035 = idf(docFreq=30, maxDocs=56511)
> >             1.0 = queryNorm
> >           8.5082035 = fieldWeight in 440, product of:
> >             1.0 = tf(freq=14.0), with freq of:
> >               14.0 = termFreq=14.0
> >             8.5082035 = idf(docFreq=30, maxDocs=56511)
> >             1.0 = fieldNorm(doc=440)
> >
> > anyone sees what I am missing?
> > I am on solr4.0
> >
> > thanks
> > xavier
> >
>
>
>
> --
> Felipe Lahti
> Consultant Developer - ThoughtWorks Porto Alegre
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message