lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Francisco Sanmartin <francis...@olx.com>
Subject Re: solr is highlighting wrong words
Date Thu, 04 Sep 2008 20:58:35 GMT
Researching more, it was already an issue. Sorry for the inconvenience.

http://issues.apache.org/jira/browse/SOLR-42

Pako


Francisco Sanmartin wrote:
> Highlighting in Solr has a strange behavior in some items. I attach an 
> example to see if anyone can throw some light at it.  Basically solr 
> is highlighting wrong words. I'm looking for the word "car" and I tell 
> solr to highlight it with the code <strong> and </strong>. The 
> response is ok in most of the cases, but there are some items that 
> appear with the wrong words highlighted. I attach an example at the 
> bottom.
>
>
> The problem of this example is that is highlighting the word "his", 
> but the search word is "car".
> This is the scenario:
>
> Solr 1.2
> The url:
> http://solr-server:8983/solr/select/?q=id:11439968%20AND%20description%3Acar&hl=on&hl.fl=description&hl.simple.pre=%3Cstrong%3E&hl.simple.post=%20%3C%2Fstrong%3E

>
>
> The query fancy style:
> <lst name="params">
> <str name="hl.simple.pre"><strong></str>
> <str name="hl.simple.post"> </strong></str>
> <str name="hl.fl">description</str>
> <str name="hl">on</str>
> <str name="q">id:11439968 AND description:car</str>
> </lst>
>
> (I query with the id to obtain the item that is failing in 
> highlighing, so everything is more clear).
>
> The response:
> <result name="response" numFound="1" start="0">
>  <doc>
>    ...
>    <int name="id">11439968</int>
>     ...
>     <str name="description">
>      This is a one of a kind all custom &#39;95 Integra LS with 2005 
> TSX headlight and tailight conversion. It has GSR all black interior, 
> 18 inch rims,     strut bars, cd changer, coil overs, HID headlights, 
> catback exhaust, intake, new clutch and brakes. Motor has 130,000 
> miles. No smoke or leaks.     Runs great. This car is completly 
> shaved. Paint is a two toned black/white with white ice flake. It is 
> flawless and ready to show. This car has not     even seen winter 
> after being built! It is stored in a garage all year. Serious inquires 
> only (203)994-0085. OR Email GUNITGN@yahoo.com. $8,500     OR BEST 
> OFFER!!!!!
>    </str>
>    ...
>  </doc>
> <lst name="highlighting">
>    <lst name="11439968">
>        <arr name="description">
>            <str>
>                back exhaust, intake, new clutch and brakes. Motor has 
> 130,000 miles. No smoke or leaks. Runs great. T<strong>his </strong>
>            </str>
>        </arr>
>    </lst>
> </lst>
> </response>
>
> The schema (relevant parts);
>
> <field name="description"            type="text_html"   indexed="true" 
> stored="true"/>
>
> ...
>
>     <fieldtype name="text_html" class="solr.TextField" 
> positionIncrementGap="100">
>      <analyzer type="index">
>          <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true"/>
>          <filter class="solr.WordDelimiterFilterFactory" 
> generateWordParts="1" generateNumberParts="1" catenateWords="1" 
> catenateNumbers="1" catenateAll="0"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.EnglishPorterFilterFactory" 
> protected="protwords.txt"/>
>          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>          <tokenizer class="solr.HTMLStripWhitespaceTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory" 
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true"/>
>          <filter class="solr.WordDelimiterFilterFactory" 
> generateWordParts="1" generateNumberParts="1" catenateWords="0" 
> catenateNumbers="0" catenateAll="0"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.EnglishPorterFilterFactory" 
> protected="protwords.txt"/>
>          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>      </analyzer>
>    </fieldtype>
>
>
> Thanks in advance.
>
> Pako
>
>
>
>


Mime
View raw message