tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Tyler <sty...@mimecast.net>
Subject Re: Detector results for Excel formats
Date Thu, 18 Mar 2010 18:12:16 GMT

Hi,

I haven't seen any responses to this. Does anyone know why I should be
seeing such unpredictable behaviour?

Simon

On 15/03/2010 09:27, "Simon Tyler" <styler@mimecast.net> wrote:

> 
> Hi,
> 
> I am doing some testing of Tika 0.6 and noticed some odd results for the
> testEXCEL.xls file included in the test suite.
> 
> 100 calls to the following code:
> 
>              is = new BufferedInputStream(new FileInputStream(filename));
> 
>             Metadata metadata = new Metadata();
>             metadata.set(Metadata.RESOURCE_NAME_KEY, filename);
>        
>             String type = tika.detect(is, metadata);
> 
> Results in different matches as application/msword or
> application/vnd.ms-excel seemingly at random.
> 
> Is this expected? Is there a way to mitigate it?
> 
> Simon
> 




Mime
View raw message