tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allison, Timothy B." <talli...@mitre.org>
Subject RE: Tika 1.15
Date Mon, 01 May 2017 22:59:38 GMT
Sounds good.  W00t!

-----Original Message-----
From: Chris Mattmann [mailto:mattmann@apache.org] 
Sent: Monday, May 1, 2017 4:57 PM
To: dev@tika.apache.org
Subject: Re: Tika 1.15

Thanks Tim. I am going to try and get tika-dl added (if possible), and also try the Sentiment
Parser next. If I can get one or both of those (in the next day or so), then I will give you
the heads up to begin testing. Video recognition is in!





On 5/1/17, 12:42 PM, "Allison, Timothy B." <tallison@mitre.org> wrote:

    I finally had a chance to look through the results of the first regression run.
    
    I made a few trivial changes to our parsers and to tika-eval.
    
    We appear to have many more exceptions in files parsed by our CompressorParser, but this
is because of reporting...not because of reality -- the exception is now coming in the container
file, not an attachment...and tika-eval wasn't matching A and B correctly.
    
    There is a regression that's been fixed in PDFBox trunk (PDFBOX-3717), but I don't see
that as a blocker.
    
    We have new exceptions in the new parsers, EMF, WMF, .xlsb, wordperfect, but that's because
we're actually parsing those now. :)
    
    All else looks to be in decent shape.
    
    Chris and Team and All,
      Let me know when you're ready for me to kick off the next regression run.
    
              Cheers,
    
                      Tim
    
    
    
    
    -----Original Message-----
    From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl.nasa.gov] 
    Sent: Wednesday, April 26, 2017 12:48 PM
    To: dev@tika.apache.org
    Subject: Re: Tika 1.15
    
    Thank you!
    
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    Chris Mattmann, Ph.D.
    Principal Data Scientist, Engineering Administrative Office (3010) Manager, NSF &
Open Source Projects Formulation and Development Offices (8212) NASA Jet Propulsion Laboratory
Pasadena, CA 91109 USA
    Office: 180-503E, Mailstop: 180-503
    Email: chris.a.mattmann@nasa.gov
    WWW:  http://sunset.usc.edu/~mattmann/
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor,
Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
    WWW: http://irds.usc.edu/
    ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
     
    
    On 4/26/17, 9:35 AM, "Allison, Timothy B." <tallison@mitre.org> wrote:
    
        Oh.  Ok.  Will wait, then?
        
        -----Original Message-----
        From: Mattmann, Chris A (3010) [mailto:chris.a.mattmann@jpl.nasa.gov] 
        Sent: Wednesday, April 26, 2017 11:38 AM
        To: dev@tika.apache.org
        Subject: Re: Tika 1.15
        
        I want to see if I can get in the VideoRecognition parser, and also the Sentiment
one.
        
        I hope to get it done in the next day or so. Thanks.
        
        ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
        Chris Mattmann, Ph.D.
        Principal Data Scientist, Engineering Administrative Office (3010) Manager, NSF &
Open Source Projects Formulation and Development Offices (8212) NASA Jet Propulsion Laboratory
Pasadena, CA 91109 USA
        Office: 180-503E, Mailstop: 180-503
        Email: chris.a.mattmann@nasa.gov
        WWW:  http://sunset.usc.edu/~mattmann/
        ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
        Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor,
Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
        WWW: http://irds.usc.edu/
        ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
         
        
        On 4/26/17, 7:54 AM, "Allison, Timothy B." <tallison@mitre.org> wrote:
        
            With the added TSD parser, I think I should rerun the regression testing.  Given
that, I also fixed 2099, and we'll benefit from a rerun.
            
            Anything else before I rerun the regression testing?
            
            Any problems observed in first run?
            
            
        
        
    
    


Mime
View raw message