On Thu, 17 Jun 2010, Max Valjanski wrote:
> I tried to do that, but I found that this does not fit into Tika
> architecture. It is required to read whole file to parse OLE-container.
Yup, I've found much the same thing. My idea was to have a new detector
that you can layer in between the others, which will parse the containers
and keep them around if needed. If you don't want it, skip it from the
chain.
I'm not sure if what I've done makes sense, but I've attached a patch that
demos the idea to TIKA-447 . Do people think the idea is worth pursuing
further, or should we try something different?
Nick
|