tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: Tika 1.15
Date Mon, 22 May 2017 11:03:41 GMT
I tried to send this offer over the weekend, but there was a failure somewhere btwn my mail
client and the tika list.

If fellow devs are willing to put up with hand holding, I'd be happy to have a go at release
manager for 1.15.

Last I remember, Tyler had some detailed notes...anyone remember where those are?

Thank you!

        Best,

                 Tim

-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org] 
Sent: Thursday, May 18, 2017 12:26 PM
To: dev@tika.apache.org
Subject: RE: Tika 1.15

+1  Thank you!

-----Original Message-----
From: Chris Mattmann [mailto:mattmann@apache.org] 
Sent: Thursday, May 18, 2017 10:15 AM
To: dev@tika.apache.org
Subject: Re: Tika 1.15

Hey Tim,

I am, Luis is, you are, that’s probably a good enough start. I’ll roll the RC this afternoon,
early AM pacific tomorrow!

Cheers,
Chris




On 5/18/17, 3:56 AM, "Allison, Timothy B." <tallison@mitre.org> wrote:

    Yes, yes we are...if you and fellow devs are ok with the log message in TIKA-2359.
    
    Happy to change that message if there are any concerns/recommendations.
    
    Onward!  Thank you!
    
    Cheers,
    
             Tim
    
    -----Original Message-----
    From: Chris Mattmann [mailto:mattmann@apache.org] 
    Sent: Wednesday, May 17, 2017 10:01 PM
    To: dev@tika.apache.org
    Subject: Re: Tika 1.15
    
    Tim, are we good for 1.15? Should I roll the RC?
    
    Thanks!
    
    
    On 5/17/17, 3:50 AM, "Allison, Timothy B." <tallison@mitre.org> wrote:
    
        Full report on attachment # diffs: http://162.242.228.174/reports/attachment_diffs_complete_20170516.xlsx
        
        Still need to look through contents diffs.
        
        -----Original Message-----
        From: Allison, Timothy B. [mailto:tallison@mitre.org] 
        Sent: Tuesday, May 16, 2017 3:11 PM
        To: dev@tika.apache.org
        Subject: RE: Tika 1.15
        
        I reran the eval with some updates, including rc1 of PDFBox 2.0.6, which is now integrated.
        
        http://162.242.228.174/reports/reports_tika_20170515.tar.gz
        
        I need to do some more digging on attachments -- hit max limit.  The decrease in attachments
from the few docs I reviewed is explained by change in default behavior of macro extraction
-- in 1.14 we were extracting macros by default, but we aren't doing this in 1.15.  However,
I want to look at more than the first x diffs because there may be other file formats further
down the results that weren't included in the report.
        
        I also want to look at the contents...haven't had a chance.
        
        >     On May 1, 2017 3:59 PM, "Allison, Timothy B." <tallison@mitre.org>
        > wrote:
        >
        >     > Sounds good.  W00t!
        >     >
        >     > -----Original Message-----
        >     > From: Chris Mattmann [mailto:mattmann@apache.org]
        >     > Sent: Monday, May 1, 2017 4:57 PM
        >     > To: dev@tika.apache.org
        >     > Subject: Re: Tika 1.15
        >     >
        >     > Thanks Tim. I am going to try and get tika-dl added (if 
        > possible), and
        >     > also try the Sentiment Parser next. If I can get one or both of those
        >     > (in the next day or so), then I will give you the heads up to 
        > begin testing.
        >     > Video recognition is in!
        >     >
        >     >
        >     >
        >     >
        >     >
        >     > On 5/1/17, 12:42 PM, "Allison, Timothy B." <tallison@mitre.org>
        > wrote:
        >     >
        >     >     I finally had a chance to look through the results of the first
        >     > regression run.
        >     >
        >     >     I made a few trivial changes to our parsers and to tika-eval.
        >     >
        >     >     We appear to have many more exceptions in files parsed by our
        >     > CompressorParser, but this is because of reporting...not because of
        >     > reality
        >     > -- the exception is now coming in the container file, not an
        >     > attachment...and tika-eval wasn't matching A and B correctly.
        >     >
        >     >     There is a regression that's been fixed in PDFBox trunk
        >     > (PDFBOX-3717), but I don't see that as a blocker.
        >     >
        >     >     We have new exceptions in the new parsers, EMF, WMF, .xlsb,
        >     > wordperfect, but that's because we're actually parsing those now. :)
        >     >
        >     >     All else looks to be in decent shape.
        >     >
        >     >     Chris and Team and All,
        >     >       Let me know when you're ready for me to kick off the next
        >     > regression run.
        >     >
        >     >               Cheers,
        >     >
        >     >                       Tim
        >     >
        >     >
        >     >
        >     >
        >     >     -----Original Message-----
        >     >     From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl.
        > nasa.gov]
        >     >     Sent: Wednesday, April 26, 2017 12:48 PM
        >     >     To: dev@tika.apache.org
        >     >     Subject: Re: Tika 1.15
        >     >
        >     >     Thank you!
        >     >
        >     >     ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
        >     > ++++++++++++++
        >     >     Chris Mattmann, Ph.D.
        >     >     Principal Data Scientist, Engineering Administrative Office
        > (3010)
        >     > Manager, NSF & Open Source Projects Formulation and Development
        >     > Offices
        >     > (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
        >     >     Office: 180-503E, Mailstop: 180-503
        >     >     Email: chris.a.mattmann@nasa.gov
        >     >     WWW:  http://sunset.usc.edu/~mattmann/
        >     >     ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
        >     > ++++++++++++++
        >     >     Director, Information Retrieval and Data Science Group (IRDS)
        >     > Adjunct Associate Professor, Computer Science Department 
        > University of
        >     > Southern California, Los Angeles, CA 90089 USA
        >     >     WWW: http://irds.usc.edu/
        >     >     ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
        >     > ++++++++++++++
        >     >
        >     >
        >     >     On 4/26/17, 9:35 AM, "Allison, Timothy B." <tallison@mitre.org>
        > wrote:
        >     >
        >     >         Oh.  Ok.  Will wait, then?
        >     >
        >     >         -----Original Message-----
        >     >         From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl.
        >     > nasa.gov]
        >     >         Sent: Wednesday, April 26, 2017 11:38 AM
        >     >         To: dev@tika.apache.org
        >     >         Subject: Re: Tika 1.15
        >     >
        >     >         I want to see if I can get in the VideoRecognition parser,
        > and
        >     > also the Sentiment one.
        >     >
        >     >         I hope to get it done in the next day or so. Thanks.
        >     >
        >     >         ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
        >     > ++++++++++++++
        >     >         Chris Mattmann, Ph.D.
        >     >         Principal Data Scientist, Engineering Administrative Office
        >     > (3010) Manager, NSF & Open Source Projects Formulation and 
        > Development
        >     > Offices
        >     > (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
        >     >         Office: 180-503E, Mailstop: 180-503
        >     >         Email: chris.a.mattmann@nasa.gov
        >     >         WWW:  http://sunset.usc.edu/~mattmann/
        >     >         ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
        >     > ++++++++++++++
        >     >         Director, Information Retrieval and Data Science Group (IRDS)
        >     > Adjunct Associate Professor, Computer Science Department 
        > University of
        >     > Southern California, Los Angeles, CA 90089 USA
        >     >         WWW: http://irds.usc.edu/
        >     >         ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
        >     > ++++++++++++++
        >     >
        >     >
        >     >         On 4/26/17, 7:54 AM, "Allison, Timothy B."
        >     > <tallison@mitre.org>
        >     > wrote:
        >     >
        >     >             With the added TSD parser, I think I should rerun the
        >     > regression testing.  Given that, I also fixed 2099, and we'll benefit
        >     > from a rerun.
        >     >
        >     >             Anything else before I rerun the regression testing?
        >     >
        >     >             Any problems observed in first run?
        >     >
        >     >
        >     >
        >     >
        >     >
        >     >
        >     >
        >     >
        >     >
        >
        >
        >
        >
        
    
    
    


Mime
View raw message