tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: last commits before pre-1.13 regression tests?
Date Wed, 20 Apr 2016 15:31:18 GMT
  Any over-recall/bad precision on your new mimes?

-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org] 
Sent: Wednesday, April 20, 2016 11:20 AM
To: dev@tika.apache.org
Subject: RE: last commits before pre-1.13 regression tests?

Results are available here: 

I've only looked briefly.  Overall, I think things look ok.

This isn't quite trunk:
* I applied Nick C's first dbf regex
* I added a temporary fix for the pooled time series parser

There are quite a few changes in mime-detection, and clearly some rare problems with pdfs
(and other formats?) now being identified as multipart/apple-double.  I think there are some
rare problems with "text/html; charset=UTF-8 -> text/plain; charset=UTF-8" 

What do others see?  Are we good to go for 1.13 after I commit the 2 * above?

-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Monday, April 18, 2016 12:26 PM
To: dev@tika.apache.org
Subject: RE: last commits before pre-1.13 regression tests?

Sounds good to me.  Given the amount of changes since the last pre-pre-run, I suspect I'll
need to redo the tests anyways. ;)

-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
Sent: Monday, April 18, 2016 12:16 PM
To: dev@tika.apache.org
Subject: Re: last commits before pre-1.13 regression tests?

Tim I would like to get in and close out all the scientific MIME updates for TREC-DD-Polar
and get that in at least.

In 1.14, my team from USC and I will deliver an automatic Deep Learning way to do MIME detection
based on these updates and also the ContentMIMEDetection mechanism described on the wiki.
We are also working on a paper to describe that too.

But for 1.13 I’ve created a JIRA ticket and will link the relevant JIRAs and PRs and I’d
like to plow through those. Can we run 1 more tika-batch after I do that to check any regressions?



Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory
Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor,
Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/

On 4/18/16, 9:11 AM, "Allison, Timothy B." <tallison@mitre.org> wrote:

>Hi All,
>  I'm about to kick off our regression tests to see if there are major issues before we
release 1.13.  Any blockers/last commits outstanding?  Still need to upgrade POI to 3.15-beta1...
 What else?
>       Cheers,
>                  Tim
>Timothy B. Allison, Ph.D.
>Principal Artificial Intelligence Engineer Group Lead K83E/Human 
>Language Technology The MITRE Corporation
>7515 Colshire Drive, McLean, VA  22102
>703-983-2473 (phone); 703-983-1379 (fax)
View raw message