lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Highlighting PDF file after the search
Date Wed, 22 Sep 2004 22:08:00 GMT

I tried your technique, i am directly streaminf the pdf file in to
Lucene highlighter as below and i get a NPE in
highlighter.getBestFragments(tokenStream, docAsString, 3, "...");

API doc is not very clear here, i fed the contents of query string
(instead of docAsString)to this method and still i get NPE..

Can you shed some light on this please!! Please post your code snippet
if you can!

My code snippet:

      File f = new File(sourceDocLocation);
            if (!f.exists())
                log.debug("File does not exist" + f.getAbsolutePath() +"
"+ f.getName());
                return null;

            org.apache.lucene.document.Document doc =

            Highlighter highlighter = new Highlighter(new
            TokenStream tokenStream = new
                    new FileReader(f));

            doc.add(Field.Text("contents", new FileReader(f)));

            // Get 3 best fragments and seperate with a "..."
           =========>>>>>>>> result =
highlighter.getBestFragments(tokenStream, queryString, 3, "...");

Vijay Balasubramanian
DPRA Inc.,
214 665 7503

                      David Spencer                                                      
                      <dave-lucene-user        To:       Lucene Users List <>
            >              cc:                                    
                                               Subject:  Re: Highlighting PDF file after the
                      09/20/2004 05:02                                                   
                      Please respond to                                                  
                      Lucene Users List                                                  

> Hello,
> I can successfully index and search the PDF documents, however i am
> able to highlight the searched text in my original PDF file (ie: like
> dtSearch
> highlights on original file)
> I took a look at the highlighter in sandbox, compiled it and have it
> ready.  I am wondering if this highlighter is for highlighting indexed
> documents or
> can it be used for PDF Files as is !  Please enlighten !

I did this a few weeks ago.

There are two ways, and they both revolve round the same thing, you need

the tokenized PDF text available.

[a] Store the tokenized PDF text in the index, or in some other file on
disk i.e. a "cache" ( but cache is a misleading term, as you can't have
a cache miss unless you can do [b]).

[b] Tokenize it on the fly when you call getBestFragments() - the 1st
arg, the TokenStream, should be one that takes a PDF file as input and
tokenizes it.,%20java.lang.String,%20int,%20java.lang.String)

> Thanks,
> Vijay Balasubramanian
> DPRA Inc.,
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message