lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Brownell <ja...@benetech.org>
Subject Solr 3.1 returning entire highlighted field
Date Thu, 05 May 2011 23:59:10 GMT
Hi,

After upgrading from Solr 1.4.0 to 3.1, are highlighting has gone from highlighting short
pieces of text to displaying what appears to be the entire contents of the highlighted field.


The request using solrj is setting the following:

params.setHighlight(true);
params.setHighlightSnippets(3);
params.set("hl.fl", "content_highlight");

From solrconfig


  <requestHandler name="dismax" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="defType">dismax</str>
      <!-- Use the regex highlight fragmenter because it seems to return better results.
-->
      <str name="f.text.hl.fragmenter">regex</str>
    </lst>
    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>  <highlighting>
   <!-- Configure the standard fragmenter -->
   <!-- This could most likely be commented out in the "default" case -->
   <fragmenter name="gap" class="org.apache.solr.highlight.GapFragmenter" default="true">
    <lst name="defaults">
     <int name="hl.fragsize">100</int>
    </lst>
   </fragmenter>

   <!-- A regular-expression-based fragmenter (f.i., for sentence extraction) -->
   <fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter">
    <lst name="defaults">
      <!-- slightly smaller fragsizes work better because of slop -->
      <int name="hl.fragsize">70</int>
      <!-- allow 50% slop on fragment sizes -->
      <float name="hl.regex.slop">0.5</float> 
      <!-- a basic sentence pattern -->
      <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
    </lst>
   </fragmenter>
   
   <!-- Configure the standard formatter -->
   <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter" default="true">
    <lst name="defaults">
     <str name="hl.simple.pre"><![CDATA[<strong>]]></str>
     <str name="hl.simple.post"><![CDATA[</strong>]]></str>
    </lst>
   </formatter>
  </highlighting>


From schema

<field name="content_highlight" type="text_highlight" indexed="true" stored="true" required="false"
compressed="true" termVectors="true" termPositions="true"/>

        <fieldType name="text_highlight" class="solr.TextField" positionIncrementGap="100">
            <analyzer>
                <tokenizer class="solr.WhitespaceTokenizerFactory" />
                <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
                    catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
/>
                <filter class="solr.LowerCaseFilterFactory" />
            </analyzer>
        </fieldType>


Any pointers anybody can provide would be greatly appreciated.

Jake
Mime
View raw message