tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3010)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: tika-python
Date Fri, 04 Nov 2016 03:34:38 GMT
Dear Jorg,

Thank you much for sending this. I have been meaning to reply to your prior
emails on the same subject. Yes it will work for other file types. Can you give
me an example file and upload it in a Github issue of a file it’s not working for?
I can take a look.


Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, Open Source Projects Formulation and Development Office (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-502
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/

On 11/3/16, 5:15 PM, "Jörg Bilert" <bilert@gmail.com> wrote:

    Hello Mr Mattman,
    I have just been looking into your pythong wrapper for tika and I like 
    it a lot.
    But there is one thing i just don't see. According to the Apache Tika 
    website Tika supports a lot of file formats (even audio and video). Buti 
    don't know how to parse them in python. ODT and PDF work fine like in 
    the samplecode on your github page.
    Could you give me a clue where to start to handle other file-types?
    Yours, Jörg Bilert

View raw message