tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tilman Hausherr (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (TIKA-1442) Upgrade to PDFBox 1.8.8
Date Fri, 24 Oct 2014 11:02:33 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14173983#comment-14173983
] 

Tilman Hausherr edited comment on TIKA-1442 at 10/24/14 11:02 AM:
------------------------------------------------------------------

After some more research, I was able to decode 5 more files (the cause was not the LZW filter,
see PDFBOX-2296, but I fixed this only in 2.0). However 7 other files are really corrupt,
portions of the files are blank when shown in AR:

115/115269.pdf
211/211876.pdf
268/268346.pdf
389/389474.pdf
443/443752.pdf
698/698813.pdf
846/846759.pdf


was (Author: tilman):
After some more research, I was able to decode 5 more files (the cause was not the LZW filter,
see ). However 7 other files are really corrupt, portions of the files are blank when shown
in AR:

115/115269.pdf
211/211876.pdf
268/268346.pdf
389/389474.pdf
443/443752.pdf
698/698813.pdf
846/846759.pdf

> Upgrade to PDFBox 1.8.8
> -----------------------
>
>                 Key: TIKA-1442
>                 URL: https://issues.apache.org/jira/browse/TIKA-1442
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>             Fix For: 1.7
>
>         Attachments: pdfbox_1_8_6V1_8_8-SNAPSHOT.xlsx, pdfbox_1_8_6V1_8_8-SNAPSHOTb.xlsx,
pdfbox_1_8_6V1_8_8-SNAPSHOTc.xlsx, pdfbox_1_8_6V1_8_8-SNAPSHOTc.zip
>
>
> Given the regressions we identified in PDFBox 1.8.7, we should upgrade to 1.8.8 as soon
as it is ready.  I'm tempted to call this a blocker on Tika 1.7.  Let's use this issue to
carry on the discussion of regression testing (if any further discussion is necessary) or
any other prep that needs to happen before 1.8.8's release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message