tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <mattm...@apache.org>
Subject Re: Tika 1.15
Date Tue, 02 May 2017 03:41:58 GMT
JSON + D3 = win




On 5/1/17, 8:39 PM, "Tyler Bui-Palsulich" <tpalsulich@apache.org> wrote:

    How exactly did you "evaluate" the results? I opened the zip and looked at
    a few of the sheets, but it's a bit daunting.
    
    Any way we could dump JSON? That's a bit easier to build visualizations for.
    
    Tyler
    
    On May 1, 2017 3:59 PM, "Allison, Timothy B." <tallison@mitre.org> wrote:
    
    > Sounds good.  W00t!
    >
    > -----Original Message-----
    > From: Chris Mattmann [mailto:mattmann@apache.org]
    > Sent: Monday, May 1, 2017 4:57 PM
    > To: dev@tika.apache.org
    > Subject: Re: Tika 1.15
    >
    > Thanks Tim. I am going to try and get tika-dl added (if possible), and
    > also try the Sentiment Parser next. If I can get one or both of those (in
    > the next day or so), then I will give you the heads up to begin testing.
    > Video recognition is in!
    >
    >
    >
    >
    >
    > On 5/1/17, 12:42 PM, "Allison, Timothy B." <tallison@mitre.org> wrote:
    >
    >     I finally had a chance to look through the results of the first
    > regression run.
    >
    >     I made a few trivial changes to our parsers and to tika-eval.
    >
    >     We appear to have many more exceptions in files parsed by our
    > CompressorParser, but this is because of reporting...not because of reality
    > -- the exception is now coming in the container file, not an
    > attachment...and tika-eval wasn't matching A and B correctly.
    >
    >     There is a regression that's been fixed in PDFBox trunk (PDFBOX-3717),
    > but I don't see that as a blocker.
    >
    >     We have new exceptions in the new parsers, EMF, WMF, .xlsb,
    > wordperfect, but that's because we're actually parsing those now. :)
    >
    >     All else looks to be in decent shape.
    >
    >     Chris and Team and All,
    >       Let me know when you're ready for me to kick off the next regression
    > run.
    >
    >               Cheers,
    >
    >                       Tim
    >
    >
    >
    >
    >     -----Original Message-----
    >     From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl.nasa.gov]
    >     Sent: Wednesday, April 26, 2017 12:48 PM
    >     To: dev@tika.apache.org
    >     Subject: Re: Tika 1.15
    >
    >     Thank you!
    >
    >     ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    > ++++++++++++++
    >     Chris Mattmann, Ph.D.
    >     Principal Data Scientist, Engineering Administrative Office (3010)
    > Manager, NSF & Open Source Projects Formulation and Development Offices
    > (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
    >     Office: 180-503E, Mailstop: 180-503
    >     Email: chris.a.mattmann@nasa.gov
    >     WWW:  http://sunset.usc.edu/~mattmann/
    >     ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    > ++++++++++++++
    >     Director, Information Retrieval and Data Science Group (IRDS) Adjunct
    > Associate Professor, Computer Science Department University of Southern
    > California, Los Angeles, CA 90089 USA
    >     WWW: http://irds.usc.edu/
    >     ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    > ++++++++++++++
    >
    >
    >     On 4/26/17, 9:35 AM, "Allison, Timothy B." <tallison@mitre.org> wrote:
    >
    >         Oh.  Ok.  Will wait, then?
    >
    >         -----Original Message-----
    >         From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl.
    > nasa.gov]
    >         Sent: Wednesday, April 26, 2017 11:38 AM
    >         To: dev@tika.apache.org
    >         Subject: Re: Tika 1.15
    >
    >         I want to see if I can get in the VideoRecognition parser, and
    > also the Sentiment one.
    >
    >         I hope to get it done in the next day or so. Thanks.
    >
    >         ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    > ++++++++++++++
    >         Chris Mattmann, Ph.D.
    >         Principal Data Scientist, Engineering Administrative Office (3010)
    > Manager, NSF & Open Source Projects Formulation and Development Offices
    > (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
    >         Office: 180-503E, Mailstop: 180-503
    >         Email: chris.a.mattmann@nasa.gov
    >         WWW:  http://sunset.usc.edu/~mattmann/
    >         ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    > ++++++++++++++
    >         Director, Information Retrieval and Data Science Group (IRDS)
    > Adjunct Associate Professor, Computer Science Department University of
    > Southern California, Los Angeles, CA 90089 USA
    >         WWW: http://irds.usc.edu/
    >         ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    > ++++++++++++++
    >
    >
    >         On 4/26/17, 7:54 AM, "Allison, Timothy B." <tallison@mitre.org>
    > wrote:
    >
    >             With the added TSD parser, I think I should rerun the
    > regression testing.  Given that, I also fixed 2099, and we'll benefit from
    > a rerun.
    >
    >             Anything else before I rerun the regression testing?
    >
    >             Any problems observed in first run?
    >
    >
    >
    >
    >
    >
    >
    >
    >
    



Mime
View raw message