tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugen Mayer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2359) Extreme slow parsing on the attachment attached
Date Fri, 12 May 2017 13:29:04 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008121#comment-16008121
] 

Eugen Mayer commented on TIKA-2359:
-----------------------------------

[~tallison@mitre.org] well i am very good at shouting - but thats it! :)

No seriously, if i would not value tika for what it is, i would not even have created an issue.
I think helping getting the perspective other people have on your project sometimes removes
a "blindness" you get from the inner. We know that from our products very well, also glad
if somebody from the outside, e.g. a customer, shakes us up or just asking questions nobody
actually calculated with.

I understand that disabling it by default is a breaking change - but no functionality is removed
- you have not just to enable it. Providing sane defaults is key for every project - and managing
those over time is a very important task to keep the project alive, since this lets you get
traction for new users which probably are not as convinced as i am an after 4 years, that
tika is awesome.

Beside that, you should really claim something else for tike. The speed-benefit in server-mode
is so immense, you should make this a default or at least promote that - but thats a different
topic. that was just something we learned recently and it was significant.

Just go on :)

> Extreme slow parsing on the attachment attached
> -----------------------------------------------
>
>                 Key: TIKA-2359
>                 URL: https://issues.apache.org/jira/browse/TIKA-2359
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Eugen Mayer
>         Attachments: Sample-doc-file-2000kb.doc
>
>
> i have 93s for parsing this document using 1.14 in server or in cli mode.
> Java:
> java version "1.8.0_121"
> Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
> debian-jessie, 8GB ram in a docker container, current xeon 3GHz, so decent (2 cores limited)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message