tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andriy Budzinskyy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1761) Error Parsing PPT (97-2003) files with password protection against modification which were created using Office 2013
Date Thu, 08 Oct 2015 06:55:26 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948169#comment-14948169
] 

Andriy Budzinskyy commented on TIKA-1761:
-----------------------------------------

The password is 123 for both files.

> Error Parsing PPT (97-2003) files with password protection against modification which
were created using Office 2013
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-1761
>                 URL: https://issues.apache.org/jira/browse/TIKA-1761
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.7, 1.10
>            Reporter: Andriy Budzinskyy
>            Assignee: Tim Allison
>         Attachments: test-2007.ppt, test-2013.ppt
>
>
> PPT documents created (or saved) as Powerpoint 97-2003 format and protected with password
against modification using Office 2013 fail during extracting text.
> But it works fine Powerpoint 97-2003 format using Office 2007
> {noformat}
> java -jar tika-app-1.10.jar --text test_2003.ppt
> Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.microsoft.OfficeParser@22b0f5af
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>         at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:185)
>         at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:489)
>         at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:139)
> Caused by: org.apache.poi.hslf.exceptions.EncryptedPowerPointFileException: PowerPoint
file is encrypted. The correct password needs to be set via Biff8EncryptionKey.setCurrentUserPassword()
>         at org.apache.poi.hslf.EncryptedSlideShow.<init>(EncryptedSlideShow.java:102)
>         at org.apache.poi.hslf.HSLFSlideShow.read(HSLFSlideShow.java:259)
>         at org.apache.poi.hslf.HSLFSlideShow.buildRecords(HSLFSlideShow.java:250)
>         at org.apache.poi.hslf.HSLFSlideShow.<init>(HSLFSlideShow.java:165)
>         at org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:61)
>         at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149)
>         at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>         ... 5 more
> {noformat}
> I've debugged Tika library and found that it fails UserEditAtom.encryptSessionPersistIdRef
property. This property is empty in files created with Office 2007 and no-empty with Office
2013.
> I've defragmented PPT files as described in https://social.msdn.microsoft.com/Forums/en-US/e33189a5-0b00-44b7-b084-f2757e9b7536/powerpoint-binary-file-format-decryption?forum=os_binaryfile
> Is this bug of Tika or POI library? 
> Should be it supported per Apache POI [encryption support|https://poi.apache.org/encryption.html]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message