tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jana, Kumar Raja" <kj...@ptc.com>
Subject RE: Microsoft Outlook (msg) files get parsed 50 times in TikaGUI
Date Thu, 05 Feb 2009 06:10:46 GMT
Hi Jukka,

Thanks for the quick reply.
I see 50 copies of the content in the extracted text output. I have
attached a sample Outlook (msg) file to this mail (which happens to be a
mail from you to the dev group). Hope it helps.

Thanks again,
Kumar

-----Original Message-----
From: Jukka Zitting [mailto:jukka.zitting@gmail.com] 
Sent: Thursday, February 05, 2009 5:22 AM
To: tika-dev@lucene.apache.org
Subject: Re: Microsoft Outlook (msg) files get parsed 50 times in
TikaGUI

Hi,

On Wed, Feb 4, 2009 at 12:00 PM, Jana, Kumar Raja <kjana@ptc.com> wrote:
> I was feeding various document formats to the TikaGUI tool and found
> that Microsoft Outlook (msg) files get parsed around 50 times!!!

Hmm, that's quite a lot... How does this "50 times" appear, do you get
50 copies of the message content in the extracted text output? Do you
have an example file that you could share with us?

BR,

Jukka Zitting

Mime
View raw message