tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niall Pemberton" <niall.pember...@gmail.com>
Subject Re: [jira] Commented: (TIKA-105) Excel parser implementation based on POI's Event API
Date Wed, 26 Dec 2007 19:38:34 GMT
On Dec 26, 2007 7:19 PM, Keith R. Bennett <kbennett@bbsinc.biz> wrote:
>
> Niall -
>
> When you say it includes the sheet name, you mean the name of each sheet
> (tab) in the Excel file, right?

Yes

> Does it come out as bare text, or is it
> encoded in a way that can be parsed (e.g. "{[Sheet: MySheet1]}")?  Or is
> this configurable?

Just plain text and not configurable ATM.

> We have a need to read Excel files with more structure than the usual
> unstructured text document.  At minimum, it would be great to be able to be
> able to know where one sheet ends and the next begins.  Is this something
> that would be appropriate to support, or does that go beyond the generic
> unstructured text parsing mission of Tika?

I'm leave that for the Tika devs to comment on.

>  Also, based on your knowledge of
> Poi (I have none), how difficult is that to implement?  I may need to do it
> myself.

Very easy. Tika has two excel parsers now the original one
(ExcelParser) uses the easier/simpler POI API and the one I wrote
(ExcelEventParser) has a smaller memory footprint, but uses the
slightly more complex POI Event API. I believe either of them could be
easily adapted to your needs though.

Niall

> Thanks much,
> Keith
>
>
> JIRA jira@apache.org wrote:
> >
> >
> >     [
> > https://issues.apache.org/jira/browse/TIKA-105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554021
> > ]
> >
> > Niall Pemberton commented on TIKA-105:
> > --------------------------------------
> >
>
> > The only functional difference between this implementation and ExcelParser
> > is that it also writes out the sheet name to the stream this could easily
> > be added with a one line change to ExcelParser though.
> >
> >
>
> --
> View this message in context: http://www.nabble.com/-jira--Created%3A-%28TIKA-105%29-Excel-parser-implementation-based-on-POI%27s-Event-API-tp13942709p14505443.html
> Sent from the Apache Tika - Development mailing list archive at Nabble.com.
>
>

Mime
View raw message