tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luis Filipe Nassif (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2359) Extreme slow parsing on the attachment attached
Date Sat, 13 May 2017 00:15:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008953#comment-16008953
] 

Luis Filipe Nassif commented on TIKA-2359:
------------------------------------------

Also, in the long run, disabling by default non pure java parsers, like is done for native
libs, rest services, non java machine learning parsers (those are awesome!) and external parsers
may be desired. Is there any other non pure java parser enabled by default, except tesseract
parser?

> Extreme slow parsing on the attachment attached
> -----------------------------------------------
>
>                 Key: TIKA-2359
>                 URL: https://issues.apache.org/jira/browse/TIKA-2359
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Eugen Mayer
>         Attachments: Sample-doc-file-2000kb.doc
>
>
> i have 93s for parsing this document using 1.14 in server or in cli mode.
> Java:
> java version "1.8.0_121"
> Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
> debian-jessie, 8GB ram in a docker container, current xeon 3GHz, so decent (2 cores limited)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message