lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <>
Subject Re: Highlighting PDF file after the search
Date Mon, 20 Sep 2004 22:02:55 GMT wrote:

> Hello,
> I can successfully index and search the PDF documents, however i am not
> able to highlight the searched text in my original PDF file (ie: like
> dtSearch
> highlights on original file)
> I took a look at the highlighter in sandbox, compiled it and have it
> ready.  I am wondering if this highlighter is for highlighting indexed
> documents or
> can it be used for PDF Files as is !  Please enlighten !

I did this a few weeks ago.

There are two ways, and they both revolve round the same thing, you need 
the tokenized PDF text available.

[a] Store the tokenized PDF text in the index, or in some other file on 
disk i.e. a "cache" ( but cache is a misleading term, as you can't have 
a cache miss unless you can do [b]).

[b] Tokenize it on the fly when you call getBestFragments() - the 1st 
arg, the TokenStream, should be one that takes a PDF file as input and 
tokenizes it.,%20java.lang.String,%20int,%20java.lang.String)
> Thanks,
> Vijay Balasubramanian
> DPRA Inc.,
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message