lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Closed: (LUCENE-644) Contrib: another highlighter approach
Date Thu, 27 Jan 2011 10:51:44 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler closed LUCENE-644.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 2.9

Closing since FastVectorHighlighter was added in Lucene 2.9.

> Contrib: another highlighter approach
> -------------------------------------
>
>                 Key: LUCENE-644
>                 URL: https://issues.apache.org/jira/browse/LUCENE-644
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/highlighter
>            Reporter: Ronnie Kolehmainen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: FulltextHighlighter.java, FulltextHighlighter.java, FulltextHighlighterTest.java,
FulltextHighlighterTest.java, svn-diff.patch, svn-diff.patch, TokenSources.java, TokenSources.java.diff
>
>
> Mark Harwoods highlighter package is a great contribution to Lucene, I've used it a lot!
However, when you have *large* documents (fields), highlighting can be quite time consuming
if you increase the number of bytes to analyze with setMaxDocBytesToAnalyze(int). The default
value of 50k is often too low for indexed PDFs etcetera, which results in empty highlight
strings.
> This is an alternative approach using term position vectors only to build fragment info
objects. Then a StringReader can read the relevant fragments and skip() between them. This
is a lot faster. Also, this method uses the *entire* field for finding the best fragments
so you're always guaranteed to get a highlight snippet.
> Because this method only works with fields which have term positions stored one can check
if this method works for a particular field using following code (taken from TokenSources.java):
>         TermFreqVector tfv = (TermFreqVector) reader.getTermFreqVector(docId, field);
>         if (tfv != null && tfv instanceof TermPositionVector)
>         {
>           // use FulltextHighlighter
>         }
>         else
>         {
>           // use standard Highlighter
>         }
> Someone else might find this useful so I'm posting the code here.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message