tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luis Filipe Nassif (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2359) Extreme slow parsing on the attachment attached
Date Thu, 11 May 2017 22:03:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16007276#comment-16007276
] 

Luis Filipe Nassif commented on TIKA-2359:
------------------------------------------

In the past I was against enabling tesseract by default automatically if it is simply installed
on the system (maybe it is there for other use cases), because of the very slow ocr process,
but I was minority...

> Extreme slow parsing on the attachment attached
> -----------------------------------------------
>
>                 Key: TIKA-2359
>                 URL: https://issues.apache.org/jira/browse/TIKA-2359
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Eugen Mayer
>         Attachments: Sample-doc-file-2000kb.doc
>
>
> i have 93s for parsing this document using 1.14 in server or in cli mode.
> Java:
> java version "1.8.0_121"
> Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
> debian-jessie, 8GB ram in a docker container, current xeon 3GHz, so decent (2 cores limited)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message