lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Svetov <vsve...@gmail.com>
Subject getBestFragments with SimpleSpanFragmenter
Date Fri, 14 Oct 2016 01:28:07 GMT
Hi  all,


I have the following 2 indexed data for the field, title_t_en:

       "\"War and Peace\" by \"Leo Tolstoy\"
       \"Three sisters" by \"Anton Chekhov\""

I am searching by :  +((title_t_en:war) (title_t_en:sister))

For every found doc's index *value*  the following code is called:

   SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter();
   QueryScorer  queryScorer = new QueryScorer(luceneQuery);
   Highlighter   highlighter = new Highlighter(htmlFormatter, queryScorer);
   SimpleSpanFragmenter fragmenter = new SimpleSpanFragmenter(queryScorer,
*5)*;
   String bestFragments  = highlighter.getBestFragments(tokenStream, *value*,
*3,*FRAGMENT_DELIMITER );

  The code produces the following bestFragments for found values:
                      "\"<B>War</B> and Peace\" by \"Leo Tolstoy\""
                       "\"Three <B>sisters</B>\" by \"Anton Chekhov\""

  Question:
                 Why does bestFragments  contain more then  5  bytes?
                 Should the getBestFragments() return  3 fragments with
delimiters , where each fragment  does not exceed 5 bytes?

Regards,
Vlad

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message