tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: last commits before pre-1.13 regression tests?
Date Wed, 20 Apr 2016 16:18:05 GMT
Just finished meeting will inspect today 

Sent from my iPhone

> On Apr 20, 2016, at 10:31 AM, Allison, Timothy B. <tallison@mitre.org> wrote:
> 
> Chris,
>  Any over-recall/bad precision on your new mimes?
> 
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org] 
> Sent: Wednesday, April 20, 2016 11:20 AM
> To: dev@tika.apache.org
> Subject: RE: last commits before pre-1.13 regression tests?
> 
> Results are available here:
> http://162.242.228.174/reports/tika_1_12_v_tika_1_13-SNAPSHOTv2.tar.bz2 
> 
> I've only looked briefly.  Overall, I think things look ok.
> 
> This isn't quite trunk:
> * I applied Nick C's first dbf regex
> * I added a temporary fix for the pooled time series parser
> 
> There are quite a few changes in mime-detection, and clearly some rare problems with
pdfs (and other formats?) now being identified as multipart/apple-double.  I think there are
some rare problems with "text/html; charset=UTF-8 -> text/plain; charset=UTF-8" 
> 
> What do others see?  Are we good to go for 1.13 after I commit the 2 * above?
> 
> -----Original Message-----
> From: Allison, Timothy B. [mailto:tallison@mitre.org]
> Sent: Monday, April 18, 2016 12:26 PM
> To: dev@tika.apache.org
> Subject: RE: last commits before pre-1.13 regression tests?
> 
> Sounds good to me.  Given the amount of changes since the last pre-pre-run, I suspect
I'll need to redo the tests anyways. ;)
> 
> -----Original Message-----
> From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
> Sent: Monday, April 18, 2016 12:16 PM
> To: dev@tika.apache.org
> Subject: Re: last commits before pre-1.13 regression tests?
> 
> Tim I would like to get in and close out all the scientific MIME updates for TREC-DD-Polar
and get that in at least.
> 
> In 1.14, my team from USC and I will deliver an automatic Deep Learning way to do MIME
detection based on these updates and also the ContentMIMEDetection mechanism described on
the wiki. We are also working on a paper to describe that too.
> 
> But for 1.13 I’ve created a JIRA ticket and will link the relevant JIRAs and PRs and
I’d like to plow through those. Can we run 1 more tika-batch after I do that to check any
regressions?
> 
> https://issues.apache.org/jira/browse/TIKA-1955
> 
> 
> Cheers,
> Chris
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory
Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor,
Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 
> 
> 
> 
> 
> 
> 
> 
> 
>> On 4/18/16, 9:11 AM, "Allison, Timothy B." <tallison@mitre.org> wrote:
>> 
>> Hi All,
>> I'm about to kick off our regression tests to see if there are major issues before
we release 1.13.  Any blockers/last commits outstanding?  Still need to upgrade POI to 3.15-beta1...
 What else?
>> 
>>      Cheers,
>> 
>>                 Tim
>> 
>> Timothy B. Allison, Ph.D.
>> Principal Artificial Intelligence Engineer Group Lead K83E/Human 
>> Language Technology The MITRE Corporation
>> 7515 Colshire Drive, McLean, VA  22102
>> 703-983-2473 (phone); 703-983-1379 (fax)
>> 

Mime
View raw message