tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <apa...@gagravarr.org>
Subject Re: Tika 2.0 - Replace POI IOUtils with commons-io IOUtils
Date Sun, 27 Mar 2016 23:52:36 GMT
On Sun, 27 Mar 2016, Bob Paulin wrote:
> Currently the Apache POI dependency is in several modules and it's sort 
> of a beast (> 2 MB in size).

You should've seen it before Jukka and Yegor spent a crazy ApacheCon 
hacking up the ooxml-lite support... ;-)

> It appears many of the modules are only using the IOUtils library.

I suspect a strong overlap with the parser classes I've helped write...

> Any concerns with replacing this POI stuff with commons-io? Does POI 
> offer anything above the commons-io functionality in IOUtils? If not I 
> think it would be great to isolate the poi dependency to the office 
> module only.

A lot of the use is for endian-specific reading of numbers and strings. 
Might be a bit of stream stuff, but mostly that can be passed off to the 
Tika IO utils classes.

>From a quick check, I can't see any endian number stuff in commons IO, but 
I might of missed it, or it might be in a different commons module. If 
not, there might be something to be said for popping that POI logic along 
with some of the Ogg-Vorbis utils stuff (another one with my grubby mits 
all over it) into a more helpful general utils grouping

Nick

Mime
View raw message