tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <apa...@gagravarr.org>
Subject Re: MP4Parser Triggers no ContentHandler.startDocument() and ContentHandler.endDocument() in one case
Date Mon, 24 Jun 2013 14:33:22 GMT
On Wed, 29 May 2013, Nick Burch wrote:
> I'm not sure if we do have a properly documented policy on what a parser 
> should do if it receives a file it can't handle. For ones that are 
> invalid (eg corrupt), I believe an exception is the expected result. The 
> case when the file seems valid, but can't be handled by the parser, not 
> sure
>
> Does anyone know if we have a policy on this, and/or where we should document 
> it?

I've made a start on documenting this on the wiki:
    https://wiki.apache.org/tika/ErrorsAndExceptions

However, there are a few bits we still need to sort out, such as this case 
(parser thinks the file is valid, but just in a format it can't cope 
with), or the case of an empty file (what we should/shouldn't output, eg 
body tag). Hopefully someone can come up with a good suggestion...!

Nick

Mime
View raw message