tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luis Filipe Nassif (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2044) MboxParser wrongly concatenates multiple text lines into single header line
Date Tue, 04 Apr 2017 01:32:42 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15954461#comment-15954461
] 

Luis Filipe Nassif commented on TIKA-2044:
------------------------------------------

Hi Andrew,

I think the third comportment is right because mbox is composed by eml files while Outlook
pst is not, so different handling for different embedded content.

Not sure if those parsers shoud output similar results...

A PR for the other issues Will be very welcome!

Thanks,
Luis

> MboxParser wrongly concatenates multiple text lines into single header line
> ---------------------------------------------------------------------------
>
>                 Key: TIKA-2044
>                 URL: https://issues.apache.org/jira/browse/TIKA-2044
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.13
>         Environment: Tika 1.13, and 1.14 nightly build at the time of this writing
>            Reporter: Vjeran Marcinko
>
> MboxParser combines multiple text lines into single header value by (suposedly) using
LIFO structure (stack - java deque), but instead it uses FIFO (queue) to fetch last inserted
line and to extend it with current line in incorrect way:
> Current code:
> Queue<String> multiline = new LinkedList<String>();
> ... few lines below...
> String latestLine = multiline.poll();
> Whereas it should be:
> Deque<String> multiline = new LinkedList<String>();
> ... few lines below...
> String latestLine = multiline.pollLast();



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message