tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: more modular parser bundles
Date Wed, 02 Dec 2015 13:16:30 GMT

So much to do...

Thank you!

-----Original Message-----
From: Bob Paulin [mailto:bob@bobpaulin.com] 
Sent: Monday, November 30, 2015 10:49 PM
To: dev@tika.apache.org
Subject: Re: more modular parser bundles

Created 2.x Branch.


On 11/30/2015 3:12 PM, Bob Paulin wrote:
> This makes sense.  I think providing an "all" jar with all the parsers 
> will be convenient for new developers.  The modular parsers would give 
> more developers a means to insulate themselves from changes and 
> upgrades to other parsers.  This is currently not available when all 
> of the parsers are combine.  So my expectation would be that the jar 
> with all the parsers would be good for general applications or POC.
> While the modules would target production deployments where developers 
> know what they want and would like to limit risk.  Also agree that new 
> documentation will be required!
> - Bob
> On Mon, Nov 30, 2015 at 2:50 PM, Nick Burch <apache@gagravarr.org 
> <mailto:apache@gagravarr.org>> wrote:
>     On Mon, 30 Nov 2015, Allison, Timothy B. wrote:
>         Perhaps we could start with a tika-advanced-bundle to gather
>         all of the nlp/advanced parsers?  Or would this have to wait
>         for Tika 2.0?
>     I've noticed that there have been a lot fewer queries (on our
>     list, on stackoverflow, at events etc) caused by people missing
>     jars of late. Not sure of the message has got out there better,
>     the right posts are getting to the top of google, the
>     troubleshooting page has done its magic, or something else
>     entirely! But I'm now less worried about the impact of modular
>     parsers on newbies that I have been before
>     To try to avoid all the existing guidance (most of it external)
>     from going stale, I'd lean towards either keeping "tika-parsers"
>     as the full version, or make "tika-parsers" be an alias to
>     "tika-parsers-all", so that current behaviour remains
>     I'd also probably suggest we change the default load error handler
>     to warn/log, so that people by default will find out more quickly
>     that they've missed jars, and probably also have an extra load
>     error log/check which triggers in the event of 0 parser
>     definitions being found. People can turn that off if they want, as
>     now, but maybe the new default should be so that newbies tend to
>     get told quickly what they've done wrong!
>     Oh, and we'll need to update the troubleshooting page too for the
>     new bundles world :)
>     Nick

View raw message