tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From naddeo giuseppe <g.nad...@gmail.com>
Subject PDF2XHTML.getLineSeparator
Date Mon, 02 Feb 2009 11:24:12 GMT
Hi everybody

It seems to me that the method getLineSeparator from PDF2XHTML
(package org.apache.tika.parser.pdf) may be improved.

I changed it
from:
    public String getLineSeparator()
    {
        try
        {
            handler.characters("\n");
        } catch(SAXException e) {

        }
        return super.getLineSeparator();
    }


to:
    public String getLineSeparator()
    {
        try
        {
            handler.element("br", "");
        } catch(SAXException e) {

        }
        return super.getLineSeparator();
    }

the resulting html is more pretty.

I hope this post could help someone.

see you,
Giunad.

-- 
If we have learned one thing from the history of invention and discovery,
it is that in the long run - and often in the short one - the most
daring prophecies seem laughably conservative.
Arthur C. Clarke, The Exploration of Space

Mime
View raw message