tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Babak Farhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-153) Allow passing of files or memory buffers to parsers
Date Tue, 13 Jan 2009 23:06:59 GMT

    [ https://issues.apache.org/jira/browse/TIKA-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663527#action_12663527

Babak Farhang commented on TIKA-153:

I suggest java.nio.FileChannel be used as the random access abstraction. This would allow
implementations such as Skwish [ http://skwish.sourceforge.net/ ] be used as the source of
a document.

Ignoring certain of its niche capabilities (such as its map method), FileChannel, it turns
out, allows one to slice and dice, construct filters (facades) in the same way java uses FilterInputStream
and FilterOutputStream. As this idea is fleshed out a bit in skwish [see http://skwish.sourceforge.net/doc/com/faunos/util/io/package-summary.html
], thought I'd share..


> Allow passing of files or memory buffers to parsers
> ---------------------------------------------------
>                 Key: TIKA-153
>                 URL: https://issues.apache.org/jira/browse/TIKA-153
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Priority: Minor
> Some of our parsers need to be able to go back and forth within a source document, so
need either a file or (for smaller documents) an in-memory buffer that contains the full document.
Currently we use temporary files for such cases, which in some cases means doing an extra
copy of a file before it gets parsed. We should come up with some way for clients to pass
in a file or a memory buffer if one is available.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message