tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <nick.bu...@alfresco.com>
Subject Re: Command Line Parser for Metadata Extraction
Date Wed, 06 Apr 2011 17:48:02 GMT
On Tue, 5 Apr 2011, Mattmann, Chris A (388J) wrote:
> Check out our CmdLineMetExtractor class [2], and this guide [3] on some 
> of our baked in MetExtractors. I think it would be awesome if we could 
> support a similar interface in Tika (I'd love to push those details 
> upstream of OODT).

I think you ought to be able to do most of that with Tika now. I don't 
know if you'll be able to change your XML files to follow the new Tika 
syntax and have Tika do everything (I think your config might have more in 
it than just what to run and how to get the metadata back?), but the new 
ExternalParser stuff ought to be more flexible for building the parsers 
dynamically yourself.

You might want to hold off until Jukka's done his usual magic of making my 
code much more elegant though :)

(I'll hopefully get a chance to do a bit more on this within a week, such 
as unit tests, and a dedicated ffmpeg external parser which use "ffmpeg 
-formats" to build the supported mimetypes at runtime)


View raw message