lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gregg Hoshovsky <>
Subject Highlight question
Date Wed, 23 Jun 2010 19:09:37 GMT
I just started working with the highlighting.  I am using the default configurations. I have
a field that I can get a single highlight to occur marking the data.

What I would like to do is this,

Given a word say 'tumor', and the sentence

" the lower tumor grew 1.5 cm. blah blah blah  we need to remove the tumor in the next surgery"

I would like to get ."...<em> the lower tumor grew 1.5 cm </em>..... blah blah
blah  we need to ...<em> remove the tumor in the next </em>..... surgery"

Thus finding multiple references to the work and  only grabbing a few words around it.

In the solrconfig.xml I have been able to change the hl.simple.pre/post variable, but when
I try to change the hl,regex pattern or the hl.snippets they don't have any effect. I thought
the hl.snippets would alow me to find more than one and highlight it, and well I tried a bunch
of regex patterns but they didn't do anything.

here is a snippet of the config file.

Any help is appreciated.


   <!-- A regular-expression-based fragmenter (f.i., for sentence extraction) -->
   <fragmenter name="regex" class="org.apache.solr.highlight.RegexFragmenter">
    <lst name="defaults">
      <!-- slightly smaller fragsizes work better because of slop -->
      <int name="hl.snippets">4</int>      <int name="hl.fragsize">70</int>
      <!-- allow 50% slop on fragment sizes -->
      <float name="hl.regex.slop">0.2</float>
      <!-- a basic sentence pattern -->
      <str name="hl.regex.pattern">[-\w ,/\n\"']{1,1}</str>

   <!-- Configure the standard formatter -->
   <formatter name="html" class="org.apache.solr.highlight.HtmlFormatter" default="true">
    <lst name="defaults">
      <int name="hl.snippets">4</int>
     <int name="hl.fragsize">100</int>
     <str name="hl.simple.pre"><![CDATA[...<em>]]></str>
     <str name=""><![CDATA[</em>....]]></str>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message