lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: [VOTE] Apache Tika 0.7 Release Candidate #1
Date Fri, 02 Apr 2010 16:30:44 GMT

I checked:

* signature of the src zip - OK.

* md5 sum, is ok, but in windows I had a problem because the md5 signature file has no "*"
before the file name, which means that the signature is non-binary:
"The sums are computed as described in RFC 1321.  When checking, the input
should be a former output of this program.  The default mode is to print
a line with checksum, a character indicating type (`*' for binary, ` ' for
text), and name for each FILE."
So I had to force md5sum to binary mode with --binary.

* mvn install call was unsuccessful, but one test failed (java 1.5.0_22, 64bit, Win7):

Running org.apache.tika.TestParsers
Tests run: 12, Failures: 0, Errors: 12, Skipped: 0, Time elapsed: 0.265 sec <<< FAILURE!

Tests in error:

Tests run: 127, Failures: 0, Errors: 12, Skipped: 0

All errors look like that:
<testcase classname="org.apache.tika.TestParsers" time="0.015" name="testOutlookExtraction">
  <error type="" message="C:\Users\Uwe%20Schindler\Desktop\tika-0.7\tika-parsers\target\test-classes\test-documents\test-outlook.msg
(Das System kann den angegebenen Pfad nicht finden)"> C:\Users\Uwe%20Schindler\Desktop\tika-0.7\tika-parsers\target\test-classes\test-documents\test-outlook.msg
(Das System kann den angegebenen Pfad nicht finden) at
Method) at<init>( at org.apache.tika.utils.ParseUtils.getStringContent(
at org.apache.tika.utils.ParseUtils.getStringContent( at org.apache.tika.TestParsers.testOutlookExtraction(</error>


If this is caused by the whitespace in my windows user's directory, it should maybe fixed
like in Lucene's tests (we had a similar problem there, too). If you search for test files
in test's classpath and open them using the Class.getResource() method and converting the
URL to a patch, you should not simply use the getPath() method from the URL as this exactly
creates those wrong filenames. The fix is in LuceneTestCase(J4).java in Lucene's classes (method
getDataFile()). You should convert the URL to an URI and create the File instance using "new
File(url.toURI())". This is the "correct" way to convert a URL to a file system path.

Should I open a test bug report?

This is not release critical, so I think you can release with this bug, as it only affects

* I downloaded the repository folder and checked all signatures using 'find . -name "*.asc"
| xargs -L1 gpg --verify' - OK.

I am +1 as a new Lucene PMC member (although the is the test bug), but in my opinion, you
should fix the md5 signatures and possibly add sha1 signatures before release. Just check
that the sums inside the files are identical.


Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen

> -----Original Message-----
> From: Mattmann, Chris A (388J) []
> Sent: Wednesday, March 31, 2010 10:02 PM
> To: Lucene mailing list
> Cc:
> Subject: [VOTE] Apache Tika 0.7 Release Candidate #1
> Hi Folks,
> I have posted a candidate for the Apache Tika 0.7 release. The source
> code
> is at:
> See the included CHANGES.txt file for details on release contents and
> latest
> changes. The release was made using the Maven2 release plugin,
> according to
> Jukka Zitting's notes:
> This plugin creates a Tika 0.7 tag at:
> And a staged M2 repository at, here:
> Please vote on releasing these packages as Apache Tika 0.7. The vote is
> open
> for the next 72 hours. Only votes from Lucene PMC are binding, but
> everyone
> is welcome to check the release candidate and voice their approval or
> disapproval. The vote passes if at least three binding +1 votes are
> cast.
> [ ] +1 Release the packages as Apache Tika 0.7.
> [ ] -1 Do not release the packages because...
> Thanks!
> Cheers,
> Chris
> P.S. Note, this will likely be the *last* Tika release under the Lucene
> umbrella since we've VOTE'd to turn Tika into a TLP. Thanks for
> participation over the years from the Lucene PMC and others in the
> Lucene
> community!
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email:
> WWW:
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

View raw message