tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Valyanskiy <max...@jet.msk.su>
Subject Re: svn commit: r1165230 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/microsoft/ooxml/ test/java/org/apache/tika/parser/microsoft/ test/resources/test-documents/
Date Mon, 05 Sep 2011 16:14:20 GMT
Hello!

05.09.2011, в 16:23, Jukka Zitting написал(а):

> That was me in revision 1164578 for TIKA-704. :-(
> 
>> -            if (root.hasEntry("CONTENTS")) {
>> -                stream = TikaInputStream.get(
>> -                        fs.createDocumentInputStream("CONTENTS"));
> 
> This was my attempt at properly handling the embedded PDF in
> TestWithPdf.docx. It was included in an OLE object with the PDF
> document as it's "CONTENTS" entry. I restored this functionality with
> some more specific checks in revision 1165259, and the resulting code
> should now work correctly with all the test documents we have.

Hm, that is strange - current version of OfficeParser.POIFSDocumentType.detectType() thinks
that "CONTENTS" part identifies POI filesystem as MS Works document. Maybe this is not right.

Please add unit test with that TestWithPdf.docx.

best wishes, Max


Mime
View raw message