tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <nick.bu...@alfresco.com>
Subject Re: Container Extractor?
Date Wed, 01 Sep 2010 10:25:58 GMT
On Wed, 1 Sep 2010, Andrzej Bialecki wrote:
> This would be very useful. We contemplated implementing something like 
> this in Nutch, to handle archives (jar/tar/zip/...), but having it in 
> Tika would be much better.

I'd forgotten about tar, that's another one to handle... :)

> Does recursive here mean that it would look into embedded zip files too? 
> Or that it would process all paths (since there is really no hierarchy 
> in zip files)?

I was thinking recursive could mean different things. For zip files, tar 
files etc, it would probably just mean root directory vs descend into all 
directories. For OLE2, it would mean checking embeded documents of embeded 
documents (normally but not always by means of descending into child 
directories). Maybe there's a clearer name for this sort of thing?

Nick

Mime
View raw message