lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <>
Subject [jira] [Updated] (LUCENE-7620) UnifiedHighlighter: add target character width BreakIterator wrapper
Date Fri, 06 Jan 2017 05:28:58 GMT


David Smiley updated LUCENE-7620:
    Attachment: LUCENE_7620_UH_LengthGoalBreakIterator.patch

Here's a patch.  I'm calling it {{LengthGoalBreakIterator}}.  In time, perhaps we might add
some tweaks like a "slop" akin to the LuceneRegexFragmenter (in Solr). 

[~jim.ferenczi] I thought you might want to take a peek.  I figure this can get into 6.4;
I'll commit it this weekend.

> UnifiedHighlighter: add target character width BreakIterator wrapper
> --------------------------------------------------------------------
>                 Key: LUCENE-7620
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: LUCENE_7620_UH_LengthGoalBreakIterator.patch
> The original Highlighter includes a {{SimpleFragmenter}} that delineates fragments (aka
Passages) by a character width.  The default is 100 characters.
> It would be great to support something similar for the UnifiedHighlighter.  It's useful
in its own right and of course it helps users transition to the UH.  I'd like to do it as
a wrapper to another BreakIterator -- perhaps a sentence one.  In this way you get back Passages
that are a number of sentences so they will look nice instead of breaking mid-way through
a sentence.  And you get some control by specifying a target number of characters.  This BreakIterator
wouldn't be a general purpose java.text.BreakIterator since it would assume it's called in
a manner exactly as the UnifiedHighlighter uses it.  It would probably be compatible with
the PostingsHighlighter too.
> I don't propose doing this by default; besides, it's easy enough to pick your BreakIterator

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message