nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John X <j...@neasys.com>
Subject Re: servlet Cached.java
Date Wed, 23 Mar 2005 17:15:26 GMT
On Wed, Mar 23, 2005 at 11:53:21AM +0100, Stephan Lagraulet wrote:
> Hi!
> We could do this for certain type of documents.
> But for PDF files, I think we should use a new feature provided by PDFBox,
> PdfHighlighter.
> This is actually using an Acrobat feature described here :
> http://partners.adobe.com/public/developer/en/pdf/HighlightFileFormat.pdf
> 
> When the user selects the link "View cache" or "View highlight", we could
> generate the XML highlight file and use it to highlight the hits directly
> inside the PDF.
> That's even better than Google cache...
> We could otherwise use Yahoo solution (launch the search engine inside
> Acrobat reader -
> http://partners.adobe.com/public/developer/en/acrobat/PDFOpenParameters.pdf
> / search parameters).
> 
> I know these are only solutions for PDFs but that's the format I'm working
> on right now and I think its use is widespread so it might be useful to
> implement these features.

Could you provide a code snippet or better a patch?
Thanks,

John

> 
> Stephan
> 
> 
> On Wed, March 23, 2005 11:19, Andrzej Bialecki said:
> > John X wrote:
> >> Hi, All,
> >>
> >> Attached please find servlet Cached.java that serves raw Content
> >> of any mime type. Current cached.jsp handles mime type text/* only.
> >> If no objection, it is going to be committed in a few days.
> >
> > I think this would be quite useful.
> >
> > However, what I think is ultimately needed to match the features of
> > other search engines is not the ability to return the cached non-html
> > content (there might even be copyright issues with this function...),
> > but an html rendering of non-html content, a la Google's "View as HTML"
> > function.
> >
> > --
> > Best regards,
> > Andrzej Bialecki
> >   ___. ___ ___ ___ _ _   __________________________________
> > [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> > ___|||__||  \|  ||  |  Embedded Unix, System Integration
> > http://www.sigram.com  Contact: info at sigram dot com
> >
> >
> 
> 
> 
__________________________________________
http://www.neasys.com - A Good Place to Be
Come to visit us today!

Mime
View raw message