nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Lagraulet" <>
Subject Re: servlet
Date Wed, 23 Mar 2005 10:53:21 GMT
We could do this for certain type of documents.
But for PDF files, I think we should use a new feature provided by PDFBox,
This is actually using an Acrobat feature described here :

When the user selects the link "View cache" or "View highlight", we could
generate the XML highlight file and use it to highlight the hits directly
inside the PDF.
That's even better than Google cache...
We could otherwise use Yahoo solution (launch the search engine inside
Acrobat reader -
/ search parameters).

I know these are only solutions for PDFs but that's the format I'm working
on right now and I think its use is widespread so it might be useful to
implement these features.


On Wed, March 23, 2005 11:19, Andrzej Bialecki said:
> John X wrote:
>> Hi, All,
>> Attached please find servlet that serves raw Content
>> of any mime type. Current cached.jsp handles mime type text/* only.
>> If no objection, it is going to be committed in a few days.
> I think this would be quite useful.
> However, what I think is ultimately needed to match the features of
> other search engines is not the ability to return the cached non-html
> content (there might even be copyright issues with this function...),
> but an html rendering of non-html content, a la Google's "View as HTML"
> function.
> --
> Best regards,
> Andrzej Bialecki
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>  Contact: info at sigram dot com

View raw message