tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2359) Extreme slow parsing on the attachment attached
Date Fri, 12 May 2017 00:26:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007416#comment-16007416
] 

Tim Allison commented on TIKA-2359:
-----------------------------------

Sorry, took me a while to dig into this.  I hadn't seen our documentation on EXIFTool, and
I know that I've seen EXIFTool quite a bit in 'top' when I run our regression testing...you're
right...that's what the documentation says, and it's true for mp4.  The MP4Parser is sorted
before the external parser so the MP4Parser gets called on an mp4 rather than EXIFTool.  However,
for some other file formats, e.g. x-msvideo, that are covered by our ExternalParser/EXIFTool
but do not have their own standalone parsers, EXIFTool is called.

I was wrong about our StringsParser.  That does have to be turned on...AFAICT.

To disable external parsers, see TIKA-1867.

> Extreme slow parsing on the attachment attached
> -----------------------------------------------
>
>                 Key: TIKA-2359
>                 URL: https://issues.apache.org/jira/browse/TIKA-2359
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Eugen Mayer
>         Attachments: Sample-doc-file-2000kb.doc
>
>
> i have 93s for parsing this document using 1.14 in server or in cli mode.
> Java:
> java version "1.8.0_121"
> Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
> debian-jessie, 8GB ram in a docker container, current xeon 3GHz, so decent (2 cores limited)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message