tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1436) improvement to PDFParser
Date Thu, 03 Sep 2015 05:30:45 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728527#comment-14728527
] 

Chris A. Mattmann commented on TIKA-1436:
-----------------------------------------

I tried applying this patch, it doesn't apply cleanly to 1.11 trunk I think it needs to be
brought up to date:

{noformat}
[chipotle:~/tmp/tika1.11] mattmann% patch -p0 < ste-20140927.patch 
patching file tika-core/src/main/java/org/apache/tika/sax/WriteOutContentHandler.java
patching file tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParser.java
Hunk #1 succeeded at 56 (offset 4 lines).
Hunk #2 succeeded at 146 with fuzz 2 (offset -12 lines).
patching file tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDFParserConfig.java
Hunk #1 succeeded at 156 with fuzz 1 (offset 12 lines).
can't find file to patch at input line 145
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|Index: tika-parsers/src/test/java/org/apache/tika/TikaTest.java
|===================================================================
|--- tika-parsers/src/test/java/org/apache/tika/TikaTest.java	(revision 1627940)
|+++ tika-parsers/src/test/java/org/apache/tika/TikaTest.java	(working copy)
--------------------------
File to patch: 
Skip this patch? [y] y
Skipping patch.
3 out of 3 hunks ignored
patching file tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
Hunk #1 FAILED at 16.
Hunk #2 succeeded at 1000 with fuzz 1 (offset 89 lines).
1 out of 2 hunks FAILED -- saving rejects to file tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java.rej
patching file tika-parsers/src/test/java/org/apache/tika/parser/rtf/RTFParserTest.java
Hunk #1 succeeded at 81 (offset -10 lines).
Hunk #2 succeeded at 88 (offset -10 lines).
Hunk #3 succeeded at 99 (offset -10 lines).
Hunk #4 succeeded at 121 with fuzz 2 (offset -10 lines).
Hunk #5 succeeded at 165 (offset -10 lines).
Hunk #6 succeeded at 221 (offset -10 lines).
Hunk #7 FAILED at 587.
1 out of 7 hunks FAILED -- saving rejects to file tika-parsers/src/test/java/org/apache/tika/parser/rtf/RTFParserTest.java.rej
[chipotle:~/tmp/tika1.11] mattmann% 

{noformat}

Also reading the comments i'm not sure of the outcome of consensus here - Jukka and Tyler
brought up some points and it seems like you responded to them, Stefano but I didn't see their
replies, etc. What conversation are you referencing on list?

Thanks.


> improvement to PDFParser
> ------------------------
>
>                 Key: TIKA-1436
>                 URL: https://issues.apache.org/jira/browse/TIKA-1436
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.6
>            Reporter: Stefano Fornari
>            Assignee: Chris A. Mattmann
>              Labels: parser, pdf
>             Fix For: 1.11
>
>         Attachments: ste-20140927.patch
>
>
> with regards to the thread "[PDFParser] - read limited number of characters" on Mar 29,
I would like to propose the attached patch. I noticed that in Tika 1.6 there have been some
work around a better handling of the WriteLimitReachedException condition, but I believe it
could be even improved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message