lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xavier jmlucjav <jmluc...@gmail.com>
Subject Re: custom similary on a field not working
Date Thu, 21 Mar 2013 14:18:26 GMT
Steve,

yes, as I already included (though maybe is not very visible) I have this
before <types> element:
<similarity class="solr.SchemaSimilarityFactory"/>

I can see explain info is indeed different, for example I have [] instead
of [DefaultSimilarity]

thanks



On Thu, Mar 21, 2013 at 3:08 PM, Steve Rowe <sarowe@gmail.com> wrote:

> Hi xavier,
>
> Have you set the global similarity to solr.SchemaSimilarityFactory?
>
> See <http://wiki.apache.org/solr/SchemaXml#Similarity>.
>
> Steve
>
> On Mar 21, 2013, at 9:44 AM, xavier jmlucjav <jmlucjav@gmail.com> wrote:
>
> > Hi Felipe,
> >
> > I need to keep positions, that is why I cannot just use
> > omitTermFreqAndPositions
> >
> >
> > On Thu, Mar 21, 2013 at 2:36 PM, Felipe Lahti <flahti@thoughtworks.com
> >wrote:
> >
> >> Do you really need a custom similarity?
> >> Did you try to put the attribute "omitTermFreqAndPositions" in your
> field?
> >>
> >> It could be:
> >>
> >> <field name="description" omitTermFreqAndPositions="true"    type="text"
> >> indexed="true" stored="true"  multiValued="false" omitNorms="true" />
> >>
> >> http://wiki.apache.org/solr/SchemaXml
> >>
> >>
> >> On Thu, Mar 21, 2013 at 7:35 AM, xavier jmlucjav <jmlucjav@gmail.com>
> >> wrote:
> >>
> >>> I have the following setup:
> >>>
> >>>        <fieldType name="text" class="solr.TextField"
> >>> positionIncrementGap="100">
> >>>            <analyzer>
> >>>                <tokenizer class="solr.StandardTokenizerFactory"/>
> >>>                <filter class="solr.LowerCaseFilterFactory"/>
> >>>            </analyzer>
> >>>        </fieldType>
> >>>        <field name="description"    type="text"   indexed="true"
> >>> stored="true"   multiValued="false" omitNorms="true" />
> >>>
> >>> I index my corpus, and I can see tf is as usual, in this doc is 14
> times
> >> in
> >>> this field:
> >>> 4.5094776 = (MATCH) weight(description:galaxy^10.0 in 440)
> >>> [DefaultSimilarity], result of:
> >>>      4.5094776 = score(doc=440,freq=14.0 = termFreq=14.0), product of:
> >>>        0.14165252 = queryWeight, product of:
> >>>          10.0 = boost
> >>>          8.5082035 = idf(docFreq=30, maxDocs=56511)
> >>>          0.0016648936 = queryNorm
> >>>        31.834784 = fieldWeight in 440, product of:
> >>>          3.7416575 = tf(freq=14.0), with freq of:
> >>>            14.0 = termFreq=14.0
> >>>          8.5082035 = idf(docFreq=30, maxDocs=56511)
> >>>          1.0 = fieldNorm(doc=440)
> >>>
> >>>
> >>> Then I modify my schema:
> >>>
> >>>    <similarity class="solr.SchemaSimilarityFactory"/>
> >>>        <fieldType name="text" class="solr.TextField"
> >>> positionIncrementGap="100">
> >>>            <analyzer>
> >>>                <tokenizer class="solr.StandardTokenizerFactory"/>
> >>>                <filter class="solr.LowerCaseFilterFactory"/>
> >>>            </analyzer>
> >>>            <similarity class="com.customsolr.NoTfSimilarityFactory"/>
> >>>        </fieldType>
> >>>
> >>> I just want to disable term freq > 1, so a term its either present or
> >> not.
> >>>
> >>> public class NoTfSimilarity extends DefaultSimilarity {
> >>>        public float tf(float freq) {
> >>>                return freq > 0 ? 1.0f : 0.0f;
> >>>        }
> >>> }
> >>>
> >>> But I still see tf=14 in my query??
> >>> 723.89526 = (MATCH) weight(description:galaxy^10.0 in 440) [], result
> of:
> >>>        723.89526 = score(doc=440,freq=14.0 = termFreq=14.0), product
> of:
> >>>          85.08203 = queryWeight, product of:
> >>>            10.0 = boost
> >>>            8.5082035 = idf(docFreq=30, maxDocs=56511)
> >>>            1.0 = queryNorm
> >>>          8.5082035 = fieldWeight in 440, product of:
> >>>            1.0 = tf(freq=14.0), with freq of:
> >>>              14.0 = termFreq=14.0
> >>>            8.5082035 = idf(docFreq=30, maxDocs=56511)
> >>>            1.0 = fieldNorm(doc=440)
> >>>
> >>> anyone sees what I am missing?
> >>> I am on solr4.0
> >>>
> >>> thanks
> >>> xavier
> >>>
> >>
> >>
> >>
> >> --
> >> Felipe Lahti
> >> Consultant Developer - ThoughtWorks Porto Alegre
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message