tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <nick.bu...@alfresco.com>
Subject Re: Detecting container formats
Date Wed, 16 Jun 2010 11:04:25 GMT
On Tue, 15 Jun 2010, Ken Krugler wrote:
> I think this is a reasonable approach, as long as (per Alex's suggestion) 
> it's configurable in various ways.
> E.g. if you know you don't want to parse OLE2-based files, so you've 
> removed jars for those parser, then it would be great to have an easy 
> way of disabling the (more expensive) mime-type detection, and 
> potentially avoid the dependency on these same jars.

Avoiding the expensive detection shouldn't be too hard, as long as we can 
figure out what to return for the mime type when we don't do the detailed 

Avoiding the jars might be a bit more tricky, but with a little bit of 
wrapping and some catching of ClassNotFoundException we should probably be 
able to manage it

Anyone know of how we could best pass the open zip / poifs objects back 
from the detector so they parsers can re-use them?


View raw message