tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephane Bastian <stephane_bast...@hotmail.com>
Subject RFE: adding a ParserFactory class
Date Thu, 23 Oct 2008 15:32:45 GMT
Hi All,

I've got a use case where I need most of the functionality currently 
available in the AutoDetectParser; ie being able to instantiate the 
appropriate parser based upon the Stream and MetaData. In case the 
parser returned is the Html one, our application logic needs to setup a 
specific ContentHandler to process the SaxContent ourselves. 
Unfortunately, this prevents us from being able to reuse the 
AutoDetectTika parser as currently defined.
However, a ParserFactory class (which doesn't exist yet) would really 
help us here and could provide public method(s) to do what's currently 
done internally by the class AutoDetectParser

One option is to provide something like this:
Parser parser = ParserFactory.getInstance().getParser(stream, metadata);

Another option is the following:
MimeType mt = MimeType.getMimeType(stream, metadata);
Parser parser = ParserFactory.getInstance().getParser(mt);

All the code needed is already there, it's simply just a matter of 
moving things around and create/initialize a ParserFactory class

If all this makes sense to you guys, please let me know so I can go 
ahead, submit a ticket, implement this and send a Patch

All the best,


View raw message