tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wing-Hong Andrew Ko (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (TIKA-2044) MboxParser wrongly concatenates multiple text lines into single header line
Date Fri, 07 Apr 2017 15:15:41 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960943#comment-15960943
] 

Wing-Hong Andrew Ko edited comment on TIKA-2044 at 4/7/17 3:14 PM:
-------------------------------------------------------------------

Hello Luis!

Thanks for the explanation!  Is there an easy way for one to attach a different EmbeddedDocumentExtractor
for mbox files vs pst files, or am I supposed to register a single EmbeddedDocumentExtractor
instance and do branching logic internally in the parseEmbedded method based on e.g. metadata.get(Metadata.CONTENT_TYPE)?

Submitted the [pull request|https://github.com/apache/tika/pull/166] with a refactor and unit
tests.

Cheers,
Andrew


was (Author: wko27):
Hello Luis!

Thanks for the explanation!  Is there an easy way for one to attach a different EmbeddedDocumentExtractor
for mbox files vs pst files, or am I supposed to register a single EmbeddedDocumentExtractor
instance and do branching logic internally in the parseEmbedded method based on e.g. metadata.get(Metadata.CONTENT_TYPE)?

Submitted the pull request with a refactor and unit tests: https://github.com/apache/tika/pull/166

Cheers,
Andrew

> MboxParser wrongly concatenates multiple text lines into single header line
> ---------------------------------------------------------------------------
>
>                 Key: TIKA-2044
>                 URL: https://issues.apache.org/jira/browse/TIKA-2044
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.13
>         Environment: Tika 1.13, and 1.14 nightly build at the time of this writing
>            Reporter: Vjeran Marcinko
>
> MboxParser combines multiple text lines into single header value by (suposedly) using
LIFO structure (stack - java deque), but instead it uses FIFO (queue) to fetch last inserted
line and to extend it with current line in incorrect way:
> Current code:
> Queue<String> multiline = new LinkedList<String>();
> ... few lines below...
> String latestLine = multiline.poll();
> Whereas it should be:
> Deque<String> multiline = new LinkedList<String>();
> ... few lines below...
> String latestLine = multiline.pollLast();



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message