tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: [VOTE] Release Apache POI 3.11 Beta 1
Date Fri, 01 Aug 2014 19:20:37 GMT
Rat checked out, successful build on linux.

+1... with one reservation

I just ran a fresh update of trunk from Tika with RC for POI 3.11 Beta 1 against a random
selection of ~10k files from govdocs1, covering many formats.  There aren't many office-x
files, but there are some, and I made sure to include every one in the govdocs1 corpus within
the ~10k files.

When comparing with Tika 1.5:
1) There are no new exceptions
2) There are 15 fewer exceptions (some pdf, but mostly POI)

The regression I reported on the Tika dev list (http://digitalcorpora.org/corp/nps/files/govdocs1/268/268620.pptx)
is really in fact fixed by POI 3.11 Beta1.

When I manually compared files with < 90% token overlap, I found improvements in POI's
handling of rounding and that the newer version of POI is no longer incorrectly adding a "_"
to some numbers in an xls file.

I found one regression in the handling of an xlsx file:

Tika 1.6 w/ POI 3.11 Beta 1 is not extracting the comments in this file, whereas Tika 1.5
(and Tika 1.6 w/ POI 3.10-Final) did extract the comments.  This suggests that the issue is
with POI, but I haven't had a chance to dig in, and unfortunately, I don't think I will have
a chance until Monday.



-----Original Message-----
From: Nick Burch [mailto:nick@apache.org] 
Sent: Friday, August 01, 2014 5:33 AM
To: dev@poi.apache.org
Subject: [VOTE] Release Apache POI 3.11 Beta 1

Hi All

It has been almost half a ear since our last release, so as previously 
discussed it seems time for another beta.

The release candidate for this release is available from:

And the tag in SVN from which it was built is:

As with all Apache release votes, please check that not only does the
code work, and no major breakages have occurred since the last
release, but also that packaging is correct, license headers and
notices exist etc.

The vote will be open for 72 hours, until the end of Sunday 3rd August. 
(It's a slightly shorter vote than normal, as Apache Tika is waiting on a 
bug fix in the release before they roll Tika 1.6!)

The vote options are:
  +1  - I support this release
   0  - I don't object to this release, but I haven't checked it
  -1  - There's a problem with the release, and that is ....

Votes are welcomed (and encouraged) from everyone, committer or not!


To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org

View raw message