tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Basran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2180) Multiple requests on Tika to extract text slows down
Date Tue, 22 Nov 2016 19:33:58 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15687660#comment-15687660
] 

Ashish Basran commented on TIKA-2180:
-------------------------------------

I tested with Word document and Excel. I observed this in 1.13 too. 

Passed 22 document to Tika server for processing. 2, 5 MB documents and rest less than 1 MB
documents. Following are the processing time in seconds (totals at the end) while processing
documents in parallel and one after other is done. I am not sure if this behavior is by design
but difference in processing time is huge. 

Sequence	Parallel
77.4790976	22.6876726
0.9335904	17.9678267
0.8854624	26.0525849
5.0577852	15.5999804
0.8060567	26.6077107
0.7831427	17.7433509
0.8196296	26.7486071
0.7667276	26.7675274
0.7648827	26.8234494
0.7632169	22.8773994
0.8247712	16.9681799
0.9260035	26.9742814
79.6387803	21.0023846
0.7795755	14.0186599
0.7646085	27.0261048
0.8339278	26.0542291
0.8345049	15.0697296
0.8402716	24.0850932
0.7785933	20.1221993
0.9135003	13.1501129
0.9229104	170.2784636
0.8859913	178.3212539

178.0030304	782.9468017


> Multiple requests on Tika to extract text slows down
> ----------------------------------------------------
>
>                 Key: TIKA-2180
>                 URL: https://issues.apache.org/jira/browse/TIKA-2180
>             Project: Tika
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 1.13, 1.14
>         Environment: Windows OS, Open JDK, 4 core 32 GB RAM
>            Reporter: Ashish Basran
>
> I observed that if I send multiple requests to Tika (eg. http://localhost:8080/tika)
with around 5MB files, Tika is very slow in completing the action. I tried with ~20 random
files, it took 170 seconds to process all the files in sequence. If I pass all files in parallel,
it took around 780 seconds to process same set of files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message