tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <nick.bu...@alfresco.com>
Subject Re: Detecting container formats
Date Tue, 29 Jun 2010 16:31:33 GMT
On Thu, 17 Jun 2010, Max Valjanski wrote:
> I tried to do that, but I found that this does not fit into Tika 
> architecture. It is required to read whole file to parse OLE-container.

Yup, I've found much the same thing. My idea was to have a new detector 
that you can layer in between the others, which will parse the containers 
and keep them around if needed. If you don't want it, skip it from the 

I'm not sure if what I've done makes sense, but I've attached a patch that 
demos the idea to TIKA-447 . Do people think the idea is worth pursuing 
further, or should we try something different?


View raw message