+1 Thank you!
-----Original Message-----
From: Chris Mattmann [mailto:mattmann@apache.org]
Sent: Thursday, May 18, 2017 10:15 AM
To: dev@tika.apache.org
Subject: Re: Tika 1.15
Hey Tim,
I am, Luis is, you are, that’s probably a good enough start. I’ll roll the RC this afternoon,
early AM pacific tomorrow!
Cheers,
Chris
On 5/18/17, 3:56 AM, "Allison, Timothy B." <tallison@mitre.org> wrote:
Yes, yes we are...if you and fellow devs are ok with the log message in TIKA-2359.
Happy to change that message if there are any concerns/recommendations.
Onward! Thank you!
Cheers,
Tim
-----Original Message-----
From: Chris Mattmann [mailto:mattmann@apache.org]
Sent: Wednesday, May 17, 2017 10:01 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.15
Tim, are we good for 1.15? Should I roll the RC?
Thanks!
On 5/17/17, 3:50 AM, "Allison, Timothy B." <tallison@mitre.org> wrote:
Full report on attachment # diffs: http://162.242.228.174/reports/attachment_diffs_complete_20170516.xlsx
Still need to look through contents diffs.
-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Tuesday, May 16, 2017 3:11 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.15
I reran the eval with some updates, including rc1 of PDFBox 2.0.6, which is now integrated.
http://162.242.228.174/reports/reports_tika_20170515.tar.gz
I need to do some more digging on attachments -- hit max limit. The decrease in attachments
from the few docs I reviewed is explained by change in default behavior of macro extraction
-- in 1.14 we were extracting macros by default, but we aren't doing this in 1.15. However,
I want to look at more than the first x diffs because there may be other file formats further
down the results that weren't included in the report.
I also want to look at the contents...haven't had a chance.
> On May 1, 2017 3:59 PM, "Allison, Timothy B." <tallison@mitre.org>
> wrote:
>
> > Sounds good. W00t!
> >
> > -----Original Message-----
> > From: Chris Mattmann [mailto:mattmann@apache.org]
> > Sent: Monday, May 1, 2017 4:57 PM
> > To: dev@tika.apache.org
> > Subject: Re: Tika 1.15
> >
> > Thanks Tim. I am going to try and get tika-dl added (if
> possible), and
> > also try the Sentiment Parser next. If I can get one or both of those
> > (in the next day or so), then I will give you the heads up to
> begin testing.
> > Video recognition is in!
> >
> >
> >
> >
> >
> > On 5/1/17, 12:42 PM, "Allison, Timothy B." <tallison@mitre.org>
> wrote:
> >
> > I finally had a chance to look through the results of the first
> > regression run.
> >
> > I made a few trivial changes to our parsers and to tika-eval.
> >
> > We appear to have many more exceptions in files parsed by our
> > CompressorParser, but this is because of reporting...not because of
> > reality
> > -- the exception is now coming in the container file, not an
> > attachment...and tika-eval wasn't matching A and B correctly.
> >
> > There is a regression that's been fixed in PDFBox trunk
> > (PDFBOX-3717), but I don't see that as a blocker.
> >
> > We have new exceptions in the new parsers, EMF, WMF, .xlsb,
> > wordperfect, but that's because we're actually parsing those now. :)
> >
> > All else looks to be in decent shape.
> >
> > Chris and Team and All,
> > Let me know when you're ready for me to kick off the next
> > regression run.
> >
> > Cheers,
> >
> > Tim
> >
> >
> >
> >
> > -----Original Message-----
> > From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl.
> nasa.gov]
> > Sent: Wednesday, April 26, 2017 12:48 PM
> > To: dev@tika.apache.org
> > Subject: Re: Tika 1.15
> >
> > Thank you!
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office
> (3010)
> > Manager, NSF & Open Source Projects Formulation and Development
> > Offices
> > (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattmann@nasa.gov
> > WWW: http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > Director, Information Retrieval and Data Science Group (IRDS)
> > Adjunct Associate Professor, Computer Science Department
> University of
> > Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> >
> >
> > On 4/26/17, 9:35 AM, "Allison, Timothy B." <tallison@mitre.org>
> wrote:
> >
> > Oh. Ok. Will wait, then?
> >
> > -----Original Message-----
> > From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl.
> > nasa.gov]
> > Sent: Wednesday, April 26, 2017 11:38 AM
> > To: dev@tika.apache.org
> > Subject: Re: Tika 1.15
> >
> > I want to see if I can get in the VideoRecognition parser,
> and
> > also the Sentiment one.
> >
> > I hope to get it done in the next day or so. Thanks.
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > Chris Mattmann, Ph.D.
> > Principal Data Scientist, Engineering Administrative Office
> > (3010) Manager, NSF & Open Source Projects Formulation and
> Development
> > Offices
> > (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 180-503E, Mailstop: 180-503
> > Email: chris.a.mattmann@nasa.gov
> > WWW: http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> > Director, Information Retrieval and Data Science Group (IRDS)
> > Adjunct Associate Professor, Computer Science Department
> University of
> > Southern California, Los Angeles, CA 90089 USA
> > WWW: http://irds.usc.edu/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > ++++++++++++++
> >
> >
> > On 4/26/17, 7:54 AM, "Allison, Timothy B."
> > <tallison@mitre.org>
> > wrote:
> >
> > With the added TSD parser, I think I should rerun the
> > regression testing. Given that, I also fixed 2099, and we'll benefit
> > from a rerun.
> >
> > Anything else before I rerun the regression testing?
> >
> > Any problems observed in first run?
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>
>
>
|