tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Pilato (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2208) Catch missing libraires
Date Wed, 14 Dec 2016 08:29:58 GMT
David Pilato created TIKA-2208:
----------------------------------

             Summary: Catch missing libraires
                 Key: TIKA-2208
                 URL: https://issues.apache.org/jira/browse/TIKA-2208
             Project: Tika
          Issue Type: Improvement
          Components: parser
            Reporter: David Pilato


Hi there


We have decided to remove support for some formats when using Tika to extract text and metadata.

We defined our list of Parsers:

{code:java}
    private static final Parser PARSERS[] = new Parser[] {
        // documents
        new org.apache.tika.parser.html.HtmlParser(),
        new org.apache.tika.parser.rtf.RTFParser(),
        new org.apache.tika.parser.pdf.PDFParser(),
        new org.apache.tika.parser.txt.TXTParser(),
        new org.apache.tika.parser.microsoft.OfficeParser(),
        new org.apache.tika.parser.microsoft.OldExcelParser(),
        new org.apache.tika.parser.microsoft.ooxml.OOXMLParser(),
        new org.apache.tika.parser.odf.OpenDocumentParser(),
        new org.apache.tika.parser.iwork.IWorkPackageParser(),
        new org.apache.tika.parser.xml.DcXMLParser(),
        new org.apache.tika.parser.epub.EpubParser(),
    };

    private static final AutoDetectParser PARSER_INSTANCE = new AutoDetectParser(PARSERS);

    private static final Tika TIKA_INSTANCE = new Tika(PARSER_INSTANCE.getDetector(), PARSER_INSTANCE);
{code}

But when a MS Office Word document embeds another non supported document (Like a Visio Schema)
an {{NoClassDefFoundError}} is raised.

Would it be possible to catch such a case and throw in that case a {{TikaException}} so it
behaves as an Exception and not as a Throwable?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message