tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2044) MboxParser wrongly concatenates multiple text lines into single header line
Date Mon, 10 Apr 2017 14:47:41 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962978#comment-15962978
] 

ASF GitHub Bot commented on TIKA-2044:
--------------------------------------

wko27 commented on a change in pull request #166: fix for TIKA-2044 contributed by wko27
URL: https://github.com/apache/tika/pull/166#discussion_r110675586
 
 

 ##########
 File path: tika-parsers/src/main/java/org/apache/tika/parser/mbox/MboxParser.java
 ##########
 @@ -188,27 +251,33 @@ private void saveHeaderInMetadata(Metadata metadata, String curLine)
{
                 property = Metadata.MESSAGE_BCC;
             }
             metadata.add(property, headerContent);
-        } else if (headerTag.equalsIgnoreCase("Subject")) {
+            break;
+        case "subject":
             metadata.add(Metadata.SUBJECT, headerContent);
-        } else if (headerTag.equalsIgnoreCase("Date")) {
+            break;
+        case "date":
             try {
                 Date date = parseDate(headerContent);
                 metadata.set(TikaCoreProperties.CREATED, date);
             } catch (ParseException e) {
                 // ignoring date because format was not understood
             }
-        } else if (headerTag.equalsIgnoreCase("Message-Id")) {
+            break;
+        case "message-id":
             metadata.set(TikaCoreProperties.IDENTIFIER, headerContent);
-        } else if (headerTag.equalsIgnoreCase("In-Reply-To")) {
+            break;
+        case "in-reply-to":
             metadata.set(TikaCoreProperties.RELATION, headerContent);
-        } else if (headerTag.equalsIgnoreCase("Content-Type")) {
+            break;
+        case "content-type":
             // TODO - key off content-type in headers to
             // set mapping to use for content and convert if necessary.
 
             metadata.add(Metadata.CONTENT_TYPE, headerContent);
             metadata.set(TikaCoreProperties.FORMAT, headerContent);
-        } else {
+            break;
+        default:
             metadata.add(EMAIL_HEADER_METADATA_PREFIX + headerTag, headerContent);
         }
     }
-}
+}
 
 Review comment:
   Oops, re-added!
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> MboxParser wrongly concatenates multiple text lines into single header line
> ---------------------------------------------------------------------------
>
>                 Key: TIKA-2044
>                 URL: https://issues.apache.org/jira/browse/TIKA-2044
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.13
>         Environment: Tika 1.13, and 1.14 nightly build at the time of this writing
>            Reporter: Vjeran Marcinko
>
> MboxParser combines multiple text lines into single header value by (suposedly) using
LIFO structure (stack - java deque), but instead it uses FIFO (queue) to fetch last inserted
line and to extend it with current line in incorrect way:
> Current code:
> Queue<String> multiline = new LinkedList<String>();
> ... few lines below...
> String latestLine = multiline.poll();
> Whereas it should be:
> Deque<String> multiline = new LinkedList<String>();
> ... few lines below...
> String latestLine = multiline.pollLast();



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message