tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ghenadie (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TIKA-2723) Issue with parsing .mht container
Date Tue, 02 Oct 2018 11:33:00 GMT

     [ https://issues.apache.org/jira/browse/TIKA-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ghenadie updated TIKA-2723:
---------------------------
    Description: 
Hello,

I have a file with .mht extension. Tika processes  this file  as an email (Is Email? - true),
and uses RFC822Parser to parse it. 

This is an issue for me. And seems to be an issue from Tika. As far as this is a web container,
it should not be parsed through RFCParser (which is an email parser). 

Please investigate this issue as soon as possible. 

Please let me know in case of any questions.

 

Thank you,

Ghenadie R.

  was:
Hello,

I have a file with .mht extension. Tika processes  this file  as an email (Is Email? - true),
and uses RFC822Parser to parse it. As a result, I have the content with email fields, as:
From, To, CC, BCC, Subject. 

This is an issue for me. And seems to be an issue from Tika. As far as this is a web container,
it should not be parsed through RFCParser (which is an email parser). 

Please investigate this issue as soon as possible. 

Please let me know in case of any questions.

 

Thank you,

Ghenadie R.


> Issue with parsing .mht container
> ---------------------------------
>
>                 Key: TIKA-2723
>                 URL: https://issues.apache.org/jira/browse/TIKA-2723
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.17
>            Reporter: Ghenadie
>            Priority: Major
>              Labels: patch
>             Fix For: 1.17
>
>         Attachments: Sample-excel.mht, [TIKA-2723] Issue with parsing _mht container
- ASF JIRA.mht
>
>
> Hello,
> I have a file with .mht extension. Tika processes  this file  as an email (Is Email?
- true), and uses RFC822Parser to parse it. 
> This is an issue for me. And seems to be an issue from Tika. As far as this is a web
container, it should not be parsed through RFCParser (which is an email parser). 
> Please investigate this issue as soon as possible. 
> Please let me know in case of any questions.
>  
> Thank you,
> Ghenadie R.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message