tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (3980)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: [DISCUSS] Give examples of Parser, Detector, and Translator usage
Date Thu, 07 Aug 2014 21:54:05 GMT
Hey Nick! :)

I'd have no problem pinching the code from Tika in Action. I wonder if
the Manning folks would mind.

I'll reach out to them.


Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA

-----Original Message-----
From: Nick Burch <apache@gagravarr.org>
Reply-To: "dev@tika.apache.org" <dev@tika.apache.org>
Date: Thursday, August 7, 2014 2:42 PM
To: "dev@tika.apache.org" <dev@tika.apache.org>
Subject: Re: [DISCUSS] Give examples of Parser, Detector, and Translator

>On Thu, 7 Aug 2014, Tyler Palsulich wrote:
>> Sounds like the new module is a good idea. So, let's jump on it! I will
>> create a new 'example' JIRA tag and create issues for creating the
>> module and adding Parse, Detect, and Translate examples. Others should
>> add issues/desired examples as they see fit. How's that sound?
>I wonder if it's worth approaching those crazy fools who wrote a book on
>Tika, to see if we could pinch one or two of their examples? If only we
>knew who they were... ;-)
>Recursion is one that causes confusion, we've got some example programs
>the wiki that we can include:
>Ray Gauss is probably our best bet for advanced metadata stuff to send in
>some examples on that!
>Another one that has generated mailing list traffic lately is embedded
>images, including re-writing links to them. There's some (LGPL) code in
>Alfresco which I wrote a few years ago to do that, Ray might be able to
>get the nod to contribute that (or a cut-down version) as an example of
>that style of parsing html + embedded resources in parallel

View raw message