tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <mattm...@apache.org>
Subject Re: Tika 1.15
Date Mon, 22 May 2017 14:15:00 GMT
Thanks for doing this Tim!

JIRA is cleaned up for ya so at least you don’t have that step ☺

Cheers,
Chris




On 5/22/17, 4:03 AM, "Allison, Timothy B." <tallison@mitre.org> wrote:

    I tried to send this offer over the weekend, but there was a failure somewhere btwn my
mail client and the tika list.
    
    If fellow devs are willing to put up with hand holding, I'd be happy to have a go at release
manager for 1.15.
    
    Last I remember, Tyler had some detailed notes...anyone remember where those are?
    
    Thank you!
    
            Best,
    
                     Tim
    
    -----Original Message-----
    From: Allison, Timothy B. [mailto:tallison@mitre.org] 
    Sent: Thursday, May 18, 2017 12:26 PM
    To: dev@tika.apache.org
    Subject: RE: Tika 1.15
    
    +1  Thank you!
    
    -----Original Message-----
    From: Chris Mattmann [mailto:mattmann@apache.org] 
    Sent: Thursday, May 18, 2017 10:15 AM
    To: dev@tika.apache.org
    Subject: Re: Tika 1.15
    
    Hey Tim,
    
    I am, Luis is, you are, that’s probably a good enough start. I’ll roll the RC this
afternoon, early AM pacific tomorrow!
    
    Cheers,
    Chris
    
    
    
    
    On 5/18/17, 3:56 AM, "Allison, Timothy B." <tallison@mitre.org> wrote:
    
        Yes, yes we are...if you and fellow devs are ok with the log message in TIKA-2359.
        
        Happy to change that message if there are any concerns/recommendations.
        
        Onward!  Thank you!
        
        Cheers,
        
                 Tim
        
        -----Original Message-----
        From: Chris Mattmann [mailto:mattmann@apache.org] 
        Sent: Wednesday, May 17, 2017 10:01 PM
        To: dev@tika.apache.org
        Subject: Re: Tika 1.15
        
        Tim, are we good for 1.15? Should I roll the RC?
        
        Thanks!
        
        
        On 5/17/17, 3:50 AM, "Allison, Timothy B." <tallison@mitre.org> wrote:
        
            Full report on attachment # diffs: http://162.242.228.174/reports/attachment_diffs_complete_20170516.xlsx
            
            Still need to look through contents diffs.
            
            -----Original Message-----
            From: Allison, Timothy B. [mailto:tallison@mitre.org] 
            Sent: Tuesday, May 16, 2017 3:11 PM
            To: dev@tika.apache.org
            Subject: RE: Tika 1.15
            
            I reran the eval with some updates, including rc1 of PDFBox 2.0.6, which is now
integrated.
            
            http://162.242.228.174/reports/reports_tika_20170515.tar.gz
            
            I need to do some more digging on attachments -- hit max limit.  The decrease
in attachments from the few docs I reviewed is explained by change in default behavior of
macro extraction -- in 1.14 we were extracting macros by default, but we aren't doing this
in 1.15.  However, I want to look at more than the first x diffs because there may be other
file formats further down the results that weren't included in the report.
            
            I also want to look at the contents...haven't had a chance.
            
            >     On May 1, 2017 3:59 PM, "Allison, Timothy B." <tallison@mitre.org>
            > wrote:
            >
            >     > Sounds good.  W00t!
            >     >
            >     > -----Original Message-----
            >     > From: Chris Mattmann [mailto:mattmann@apache.org]
            >     > Sent: Monday, May 1, 2017 4:57 PM
            >     > To: dev@tika.apache.org
            >     > Subject: Re: Tika 1.15
            >     >
            >     > Thanks Tim. I am going to try and get tika-dl added (if 
            > possible), and
            >     > also try the Sentiment Parser next. If I can get one or both of
those
            >     > (in the next day or so), then I will give you the heads up to 
            > begin testing.
            >     > Video recognition is in!
            >     >
            >     >
            >     >
            >     >
            >     >
            >     > On 5/1/17, 12:42 PM, "Allison, Timothy B." <tallison@mitre.org>
            > wrote:
            >     >
            >     >     I finally had a chance to look through the results of the first
            >     > regression run.
            >     >
            >     >     I made a few trivial changes to our parsers and to tika-eval.
            >     >
            >     >     We appear to have many more exceptions in files parsed by our
            >     > CompressorParser, but this is because of reporting...not because
of
            >     > reality
            >     > -- the exception is now coming in the container file, not an
            >     > attachment...and tika-eval wasn't matching A and B correctly.
            >     >
            >     >     There is a regression that's been fixed in PDFBox trunk
            >     > (PDFBOX-3717), but I don't see that as a blocker.
            >     >
            >     >     We have new exceptions in the new parsers, EMF, WMF, .xlsb,
            >     > wordperfect, but that's because we're actually parsing those now.
:)
            >     >
            >     >     All else looks to be in decent shape.
            >     >
            >     >     Chris and Team and All,
            >     >       Let me know when you're ready for me to kick off the next
            >     > regression run.
            >     >
            >     >               Cheers,
            >     >
            >     >                       Tim
            >     >
            >     >
            >     >
            >     >
            >     >     -----Original Message-----
            >     >     From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl.
            > nasa.gov]
            >     >     Sent: Wednesday, April 26, 2017 12:48 PM
            >     >     To: dev@tika.apache.org
            >     >     Subject: Re: Tika 1.15
            >     >
            >     >     Thank you!
            >     >
            >     >     ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
            >     > ++++++++++++++
            >     >     Chris Mattmann, Ph.D.
            >     >     Principal Data Scientist, Engineering Administrative Office
            > (3010)
            >     > Manager, NSF & Open Source Projects Formulation and Development
            >     > Offices
            >     > (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
            >     >     Office: 180-503E, Mailstop: 180-503
            >     >     Email: chris.a.mattmann@nasa.gov
            >     >     WWW:  http://sunset.usc.edu/~mattmann/
            >     >     ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
            >     > ++++++++++++++
            >     >     Director, Information Retrieval and Data Science Group (IRDS)
            >     > Adjunct Associate Professor, Computer Science Department 
            > University of
            >     > Southern California, Los Angeles, CA 90089 USA
            >     >     WWW: http://irds.usc.edu/
            >     >     ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
            >     > ++++++++++++++
            >     >
            >     >
            >     >     On 4/26/17, 9:35 AM, "Allison, Timothy B." <tallison@mitre.org>
            > wrote:
            >     >
            >     >         Oh.  Ok.  Will wait, then?
            >     >
            >     >         -----Original Message-----
            >     >         From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl.
            >     > nasa.gov]
            >     >         Sent: Wednesday, April 26, 2017 11:38 AM
            >     >         To: dev@tika.apache.org
            >     >         Subject: Re: Tika 1.15
            >     >
            >     >         I want to see if I can get in the VideoRecognition parser,
            > and
            >     > also the Sentiment one.
            >     >
            >     >         I hope to get it done in the next day or so. Thanks.
            >     >
            >     >         ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
            >     > ++++++++++++++
            >     >         Chris Mattmann, Ph.D.
            >     >         Principal Data Scientist, Engineering Administrative Office
            >     > (3010) Manager, NSF & Open Source Projects Formulation and 
            > Development
            >     > Offices
            >     > (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
            >     >         Office: 180-503E, Mailstop: 180-503
            >     >         Email: chris.a.mattmann@nasa.gov
            >     >         WWW:  http://sunset.usc.edu/~mattmann/
            >     >         ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
            >     > ++++++++++++++
            >     >         Director, Information Retrieval and Data Science Group (IRDS)
            >     > Adjunct Associate Professor, Computer Science Department 
            > University of
            >     > Southern California, Los Angeles, CA 90089 USA
            >     >         WWW: http://irds.usc.edu/
            >     >         ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
            >     > ++++++++++++++
            >     >
            >     >
            >     >         On 4/26/17, 7:54 AM, "Allison, Timothy B."
            >     > <tallison@mitre.org>
            >     > wrote:
            >     >
            >     >             With the added TSD parser, I think I should rerun the
            >     > regression testing.  Given that, I also fixed 2099, and we'll benefit
            >     > from a rerun.
            >     >
            >     >             Anything else before I rerun the regression testing?
            >     >
            >     >             Any problems observed in first run?
            >     >
            >     >
            >     >
            >     >
            >     >
            >     >
            >     >
            >     >
            >     >
            >
            >
            >
            >
            
        
        
        
    
    
    



Mime
View raw message