lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From " (JIRA)" <>
Subject [jira] [Created] (LUCENE-5381) Lucene highlighter doesn't honor hl.fragsize; it appends all text for last fragment
Date Wed, 01 Jan 2014 15:22:50 GMT created LUCENE-5381:

             Summary: Lucene highlighter doesn't honor hl.fragsize; it appends all text for
last fragment
                 Key: LUCENE-5381
             Project: Lucene - Core
          Issue Type: Bug
          Components: modules/highlighter
    Affects Versions: 4.6, 4.0
            Priority: Minor
             Fix For: 5.0, 4.7
         Attachments: LUCENE-5381.patch

Recently, we hit a problem related with highlighter: I set hl.fragsize = 300, but the highlight
section for one document oupputs more than 2000 characters.

Look into the code, in,
String, boolean, int),  after the for loop, it appends whole remaining text into last fragment.
if (
		// if there is text beyond the last token considered..
		(lastEndOffset < text.length())
		// and that text is not too large...
		(text.length()<= maxDocCharsToAnalyze)
	//append it to the last fragment
currentFrag.textEndPos = newText.length();

This code is problematical, as in some cases, the last fragment is the most relevant section
and will be selected to return to client.

I made some change to the code like below:  It seems work for me :)
//Test what remains of the original text beyond the point where we stopped analyzing
if(lastEndOffset < text.length())
	if(textFragmenter instanceof SimpleFragmenter)
		SimpleFragmenter simpleFragmenter = (SimpleFragmenter) textFragmenter;
		int remain =simpleFragmenter.getFragmentSize() -(newText.length() - currentFrag.textStartPos);
		if(remain > 0 )
			int endIndex = lastEndOffset + remain;
			if (endIndex > text.length()) {
				endIndex = text.length();
currentFrag.textEndPos = newText.length();

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message