tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Koren <jonat...@soe.ucsc.edu>
Subject Re: ContentHandler's OutputStream
Date Thu, 05 Feb 2009 10:02:18 GMT

On Feb 5, 2009, at 1:22 AM, Jukka Zitting wrote:

> Hi,
>
> On Thu, Feb 5, 2009 at 3:02 AM, Jonathan Koren  
> <jonathan@soe.ucsc.edu> wrote:
>> What I really want is someone to tell me how to get back a usable  
>> stream of
>> plaintext, whether this involves a radical change to Tika's  
>> ContentHandler
>> class or some trick with Java, I really don't care, as long as it's  
>> single
>> thread save.
>
> Have you looked at the ParsingReader class? It seems like a perfect
> match to your needs. The ParsingReader class fires a background thread
> to do the parsing and pipes the output so you can control when and how
> you want to read the extracted text.

I had no idea that class existed.  Thanks.

--
Jonathan Koren
jonathan@soe.ucsc.edu
http://www.soe.ucsc.edu/~jonathan/



Mime
View raw message