tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tyler Bui-Palsulich <tpalsul...@apache.org>
Subject RE: Tika 1.15
Date Tue, 02 May 2017 03:39:08 GMT
How exactly did you "evaluate" the results? I opened the zip and looked at
a few of the sheets, but it's a bit daunting.

Any way we could dump JSON? That's a bit easier to build visualizations for.

Tyler

On May 1, 2017 3:59 PM, "Allison, Timothy B." <tallison@mitre.org> wrote:

> Sounds good.  W00t!
>
> -----Original Message-----
> From: Chris Mattmann [mailto:mattmann@apache.org]
> Sent: Monday, May 1, 2017 4:57 PM
> To: dev@tika.apache.org
> Subject: Re: Tika 1.15
>
> Thanks Tim. I am going to try and get tika-dl added (if possible), and
> also try the Sentiment Parser next. If I can get one or both of those (in
> the next day or so), then I will give you the heads up to begin testing.
> Video recognition is in!
>
>
>
>
>
> On 5/1/17, 12:42 PM, "Allison, Timothy B." <tallison@mitre.org> wrote:
>
>     I finally had a chance to look through the results of the first
> regression run.
>
>     I made a few trivial changes to our parsers and to tika-eval.
>
>     We appear to have many more exceptions in files parsed by our
> CompressorParser, but this is because of reporting...not because of reality
> -- the exception is now coming in the container file, not an
> attachment...and tika-eval wasn't matching A and B correctly.
>
>     There is a regression that's been fixed in PDFBox trunk (PDFBOX-3717),
> but I don't see that as a blocker.
>
>     We have new exceptions in the new parsers, EMF, WMF, .xlsb,
> wordperfect, but that's because we're actually parsing those now. :)
>
>     All else looks to be in decent shape.
>
>     Chris and Team and All,
>       Let me know when you're ready for me to kick off the next regression
> run.
>
>               Cheers,
>
>                       Tim
>
>
>
>
>     -----Original Message-----
>     From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl.nasa.gov]
>     Sent: Wednesday, April 26, 2017 12:48 PM
>     To: dev@tika.apache.org
>     Subject: Re: Tika 1.15
>
>     Thank you!
>
>     ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
>     Chris Mattmann, Ph.D.
>     Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, NSF & Open Source Projects Formulation and Development Offices
> (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>     Office: 180-503E, Mailstop: 180-503
>     Email: chris.a.mattmann@nasa.gov
>     WWW:  http://sunset.usc.edu/~mattmann/
>     ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
>     Director, Information Retrieval and Data Science Group (IRDS) Adjunct
> Associate Professor, Computer Science Department University of Southern
> California, Los Angeles, CA 90089 USA
>     WWW: http://irds.usc.edu/
>     ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
>
>
>     On 4/26/17, 9:35 AM, "Allison, Timothy B." <tallison@mitre.org> wrote:
>
>         Oh.  Ok.  Will wait, then?
>
>         -----Original Message-----
>         From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl.
> nasa.gov]
>         Sent: Wednesday, April 26, 2017 11:38 AM
>         To: dev@tika.apache.org
>         Subject: Re: Tika 1.15
>
>         I want to see if I can get in the VideoRecognition parser, and
> also the Sentiment one.
>
>         I hope to get it done in the next day or so. Thanks.
>
>         ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
>         Chris Mattmann, Ph.D.
>         Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, NSF & Open Source Projects Formulation and Development Offices
> (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>         Office: 180-503E, Mailstop: 180-503
>         Email: chris.a.mattmann@nasa.gov
>         WWW:  http://sunset.usc.edu/~mattmann/
>         ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
>         Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department University of
> Southern California, Los Angeles, CA 90089 USA
>         WWW: http://irds.usc.edu/
>         ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++++++++++++++
>
>
>         On 4/26/17, 7:54 AM, "Allison, Timothy B." <tallison@mitre.org>
> wrote:
>
>             With the added TSD parser, I think I should rerun the
> regression testing.  Given that, I also fixed 2099, and we'll benefit from
> a rerun.
>
>             Anything else before I rerun the regression testing?
>
>             Any problems observed in first run?
>
>
>
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message