lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <iori...@yahoo.com.INVALID>
Subject Re: Length norm not functioning in solr queries.
Date Tue, 09 Dec 2014 11:41:17 GMT
Hi,

Default length norm is not best option for differentiating very short documents, like product
names.
Please see : http://find.searchhub.org/document/b3f776512ab640ec#b3f776512ab640ec

I suggest you to create an additional integer field, that holds number of tokens. You can
populate it via update processor. And then penalise (using fuction queries) according to that
field. This way you have more fine grained and flexible control over it.

Ahmet



On Tuesday, December 9, 2014 12:22 PM, S.L <simpleliving016@gmail.com> wrote:
Hi ,

Mikhail Thanks , I looked at the explain and this is what I see for the two
different documents in questions, they have identical scores   even though
the document 2 has a shorter productName field, I do not see any lenghtNorm
related information in the explain.

Also I am not exactly clear on what needs to be looked in the API ?

*Search Query* : q=iphone+4s+16gb&qf= productName&mm=1&pf=
productName&ps=1&pf2= productName&pf3=
productName&stopwords=true&lowercaseOperators=true

*productName Details about Apple iPhone 4s 16GB Smartphone AT&T Factory
Unlocked *


   - *100%* 10.649221 sum of the following:
      - *10.58%* 1.1270299 sum of the following:
         - *2.1%* 0.22383358 productName:iphon
         - *3.47%* 0.36922288 productName:"4 s"
         - *5.01%* 0.53397346 productName:"16 gb"
      - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
      - *27.79%* 2.959255 sum of the following:
         - *10.97%* 1.1680154 productName:"iphon 4 s"~1
         - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
      - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1


*productName Apple iPhone 4S 16GB for Net10, No Contract, White*


   - *100%* 10.649221 sum of the following:
      - *10.58%* 1.1270299 sum of the following:
         - *2.1%* 0.22383358 productName:iphon
         - *3.47%* 0.36922288 productName:"4 s"
         - *5.01%* 0.53397346 productName:"16 gb"
      - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1
      - *27.79%* 2.959255 sum of the following:
         - *10.97%* 1.1680154 productName:"iphon 4 s"~1
         - *16.82%* 1.7912396 productName:"4 s 16 gb"~1
      - *30.81%* 3.2814684 productName:"iphon 4 s 16 gb"~1





On Mon, Dec 8, 2014 at 10:25 AM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> It's worth to look into <explain> to check particular scoring values. But
> for most suspect is the reducing precision when float norms are stored in
> byte vals. See javadoc for DefaultSimilarity.encodeNormValue(float)
>
>
> On Mon, Dec 8, 2014 at 5:49 PM, S.L <simpleliving016@gmail.com> wrote:
>
> > I have two documents doc1 and doc2 and each one of those has a field
> called
> > phoneName.
> >
> > doc1:phoneName:"Details about  Apple iPhone 4s - 16GB - White (Verizon)
> > Smartphone Factory Unlocked"
> >
> > doc2:phoneName:"Apple iPhone 4S 16GB for Net10, No Contract, White"
> >
> > Here if I search for
> >
> >
> q=iphone+4s+16gb&qf=phoneName&mm=1&pf=phoneName&ps=1&pf2=phoneName&pf3=phoneName&stopwords=true&lowercaseOperators=true
> >
> > Doc1 and Doc2 both have the same identical score , but since the field
> > phoneName in the doc2 has shorter length I would expect it to have a
> higher
> > score , but both have an identical score of 9.961212.
> >
> > The phoneName filed is defined as follows.As we can see no where am I
> > specifying omitNorms=True, still the behavior seems to be that the length
> > norm is not functioning at all. Can some one let me know whats the issue
> > here ?
> >
> >         <field name="phoneName" type="text_en_splitting" indexed="true"
> >             stored="true" required="true" />
> >         <fieldType name="text_en_splitting" class="solr.TextField"
> >             positionIncrementGap="100" autoGeneratePhraseQueries="true">
> >             <analyzer type="index">
> >                 <tokenizer class="solr.WhitespaceTokenizerFactory" />
> >                 <!-- in this example, we will only use synonyms at query
> > time <filter
> >                     class="solr.SynonymFilterFactory"
> > synonyms="index_synonyms.txt" ignoreCase="true"
> >                     expand="false"/> -->
> >                 <!-- Case insensitive stop word removal. add
> > enablePositionIncrements=true
> >                     in both the index and query analyzers to leave a
> 'gap'
> > for more accurate
> >                     phrase queries. -->
> >                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> >                     words="lang/stopwords_en.txt"
> > enablePositionIncrements="true" />
> >                 <filter class="solr.WordDelimiterFilterFactory"
> >                     generateWordParts="1" generateNumberParts="1"
> > catenateWords="1"
> >                     catenateNumbers="1" catenateAll="0"
> > splitOnCaseChange="1" />
> >                 <filter class="solr.LowerCaseFilterFactory" />
> >                 <filter class="solr.KeywordMarkerFilterFactory"
> > protected="protwords.txt" />
> >                 <filter class="solr.PorterStemFilterFactory" />
> >             </analyzer>
> >             <analyzer type="query">
> >                 <tokenizer class="solr.WhitespaceTokenizerFactory" />
> >                 <filter class="solr.SynonymFilterFactory"
> > synonyms="synonyms.txt"
> >                     ignoreCase="true" expand="true" />
> >                 <filter class="solr.StopFilterFactory" ignoreCase="true"
> >                     words="lang/stopwords_en.txt"
> > enablePositionIncrements="true" />
> >                 <filter class="solr.WordDelimiterFilterFactory"
> >                     generateWordParts="1" generateNumberParts="1"
> > catenateWords="0"
> >                     catenateNumbers="0" catenateAll="0"
> > splitOnCaseChange="1" />
> >                 <filter class="solr.LowerCaseFilterFactory" />
> >                 <filter class="solr.KeywordMarkerFilterFactory"
> > protected="protwords.txt" />
> >                 <filter class="solr.PorterStemFilterFactory" />
> >             </analyzer>
> >         </fieldType>
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mkhludnev@griddynamics.com>
>

Mime
View raw message