tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robin Schimpf (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2099) Tar files without magic bytes are sporadically detected as text
Date Mon, 12 Dec 2016 07:54:58 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741257#comment-15741257
] 

Robin Schimpf commented on TIKA-2099:
-------------------------------------

Any updates regarding this error?

> Tar files without magic bytes are sporadically detected as text
> ---------------------------------------------------------------
>
>                 Key: TIKA-2099
>                 URL: https://issues.apache.org/jira/browse/TIKA-2099
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.11
>            Reporter: Robin Schimpf
>
> When a tar is created with 7 Zip 9.20 the magic bytes "ustar" are not added. Everything
seems to work file if the tar contains Microsoft Office files. But when only text files are
contained Tika sporadically recognices it as text/plain. It also seems to depend on the size
of the first file in the tar. This has to be several KB big.
> The problem was found in version 1.11 and also exists in the latest 1.14-SNAPSHOT.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message