lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@safaribooksonline.com>
Subject PostingHighlighter complains about no offsets
Date Fri, 02 May 2014 13:34:58 GMT
I've been wanting to try out the PostingsHighlighter, so I added 
storeOffsetsWithPositions to my field definition, enabled the 
highlighter in solrconfig.xml,  reindexed and tried it out.  When I 
issue a query I'm getting this error:

|field 'text' was indexed without offsets, cannot highlight


java.lang.IllegalArgumentException: field 'text' was indexed without offsets, cannot highlight
	at org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightDoc(PostingsHighlighter.java:545)
	at org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightField(PostingsHighlighter.java:467)
	at org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightFieldsAsObjects(PostingsHighlighter.java:392)
	at org.apache.lucene.search.postingshighlight.PostingsHighlighter.highlightFields(PostingsHighlighter.java:293)|

I've been trying to figure out why the field wouldn't have offsets 
indexed, but I just can't see it.  Is there something in the analysis 
chain that could stripping out offsets?


This is the field definition:

     <field name="text" type="text_en" indexed="true" stored="true" 
multiValued="false" termVectors="true" termPositions="true" 
termOffsets="true" storeOffsetsWithPositions="true" />

(Yes I know PH doesn't require term vectors; I'm keeping them around for 
now while I experiment)

     <fieldType name="text_en" class="solr.TextField" 
positionIncrementGap="100">
       <analyzer type="index">
         <!-- We are indexing mostly HTML so we need to ignore the tags -->
         <charFilter class="solr.HTMLStripCharFilterFactory"/>
         <!--<tokenizer class="solr.StandardTokenizerFactory"/>-->
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <!-- lower casing must happen before WordDelimiterFilter or 
protwords.txt will not work -->
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.WordDelimiterFilterFactory" 
stemEnglishPossessive="1" protected="protwords.txt"/>
         <!-- This deals with contractions -->
         <filter class="solr.SynonymFilterFactory" 
synonyms="synonyms.txt" expand="true" ignoreCase="true"/>
         <filter class="solr.HunspellStemFilterFactory" 
dictionary="en_US.dic" affix="en_US.aff" ignoreCase="true"/>
         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
       </analyzer>
       <analyzer type="query">
         <!--<tokenizer class="solr.StandardTokenizerFactory"/>-->
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <!-- lower casing must happen before WordDelimiterFilter or 
protwords.txt will not work -->
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.WordDelimiterFilterFactory" 
protected="protwords.txt"/>
         <!-- setting tokenSeparator="" solves issues with compound 
words and improves phrase search -->
         <filter class="solr.HunspellStemFilterFactory" 
dictionary="en_US.dic" affix="en_US.aff" ignoreCase="true"/>
         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
       </analyzer>
     </fieldType>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message