tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Microsoft Outlook (msg) files get parsed 50 times in TikaGUI
Date Thu, 05 Feb 2009 08:40:03 GMT

On Thu, Feb 5, 2009 at 7:10 AM, Jana, Kumar Raja <kjana@ptc.com> wrote:
> I see 50 copies of the content in the extracted text output.

OK. This is probably some issue with the Outlook parser from POI or
with the way we use it in Tika.

> I have attached a sample Outlook (msg) file to this mail (which happens
> to be a mail from you to the dev group). Hope it helps.

Unfortunately the mailing list filters seem to have stripped the
attachment. Can you file a bug report about this in Jira and attach
the example mail there?


Jukka Zitting

View raw message