lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <>
Subject Apache Tika's public regression corpus
Date Wed, 05 Oct 2016 17:56:40 GMT

I recently blogged about some of the work we're doing with a large scale regression corpus
to make Tika, POI and PDFBox more robust and to identify regressions before release.  If you'd
like to chip in with recommendations, requests or Hadoop/Spark clusters (why not shoot for
the stars), please do!

Many thanks, again, to Rackspace for our vm and to Common Crawl and govdocs1 for most of our


View raw message