lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject Apache Tika's public regression corpus
Date Wed, 05 Oct 2016 17:56:40 GMT
All,

I recently blogged about some of the work we're doing with a large scale regression corpus
to make Tika, POI and PDFBox more robust and to identify regressions before release.  If you'd
like to chip in with recommendations, requests or Hadoop/Spark clusters (why not shoot for
the stars), please do!

  http://openpreservation.org/blog/2016/10/04/apache-tikas-regression-corpus-tika-1302/

Many thanks, again, to Rackspace for our vm and to Common Crawl and govdocs1 for most of our
files!

        Cheers,

                 Tim
Mime
View raw message