tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] Updated: (TIKA-361) Update OutlookExtractor to match new POI API
Date Mon, 14 Jun 2010 14:09:13 GMT

     [ https://issues.apache.org/jira/browse/TIKA-361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nick Burch updated TIKA-361:
----------------------------

    Attachment: outlook.patch

New patch. This one is also for POI 3.7 beta 1 + latest tika

It includes better date matching, and saves the recipient email addresses into the metadata
scope. It also updates the mbox parser to do the same. However, the metadata tag for this
might want tweaking

> Update OutlookExtractor to match new POI API
> --------------------------------------------
>
>                 Key: TIKA-361
>                 URL: https://issues.apache.org/jira/browse/TIKA-361
>             Project: Tika
>          Issue Type: New Feature
>    Affects Versions: 0.6
>            Reporter: Nick Burch
>         Attachments: outlook.patch
>
>
> OutlookExtractor currently uses POIChunkParser, which is a somewhat internal class, and
has recently undergone a large number of changes.
> The attached patch changes OutlookExtractor to use the more stable MAPIMessage for text
extraction, which allows it to continue extracting with the latest POI code in svn.
> The changes in POI's svn also allow for easy access to a few more bits of the message.
The patch adds date support, but possibly a few others will be wanted in future as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message