tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1879) Extract recipient information in MSG files with more granularity
Date Tue, 07 Mar 2017 20:24:38 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900088#comment-15900088
] 

Tim Allison commented on TIKA-1879:
-----------------------------------

For "from", I assumed a single sender (which isn't always the case with "on behalf of" and/or
"via"), and I created separate fields for Exchange email formats, e.g.
"/o=ExchangeLabs/ou=Exchange Administrative Group (FYDIBOHF23SPDLT)/cn=Recipients/cn=polyspot1.onmicrosoft.com-50609-Some-One

was mapped to: 
message_from_o=ExchangeLabs,
message_from_ou=Exchange AdministrativeGroup (FY...)
message_from_cn=polyspot1....

However, this won't map neatly to handling the "to" fields.  One unsatisfactory option is
to keep a parallel arrays of names, smtpemails and exchangeemails, with empty cells in the
smtpemails when there is an exchange formatted email and vice versa.  A cleaner option would
be to have a single pair of parallel arrays with name[] and email[], where email[] would include
the literal email value, whether it is smtp or exchange; the user would then have to parse
an Exchange email address if they wanted to differentiate _o, _ou and _cn.

[~mcaruanagalizia] and [~lfcnassif], any recommendations?

> Extract recipient information in MSG files with more granularity
> ----------------------------------------------------------------
>
>                 Key: TIKA-1879
>                 URL: https://issues.apache.org/jira/browse/TIKA-1879
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Tim Allison
>            Priority: Minor
>
> As proposed in the parent task, it might be nice to have a parallel array for recipient
name/recipient email for TO, CC and BCC.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message