jmeter-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sebb <seb...@gmail.com>
Subject Re: Add Apache Tika in JMeter to extract text from various file type
Date Mon, 05 Nov 2012 11:26:59 GMT
On 3 November 2012 19:23, Milamber <milamber@apache.org> wrote:
> Hello,
>
> Currently, I work to add Apache Tika 1.2 [1] in JMeter to improve functional
> tests.
>
> With Tika, you can extract the text form various documents, like MS Office
> (Word, Excel, PowerPoint 97-2003, 2007-2010 (openxml), OpenOffice (writer,
> calc, impress), HTML, Gz, jar/zip files (list of content), and some
> "multimedia" files like mp3, mp4, flv, etc.
>
> In JMeter, Tika can be used by the View Results Tree to view the text data
> of this files, Regular extractor to catch some text from this files and
> Response assertion to assert on the data.
>
> The inconvenient is: Apache Tika requires a big jar (25Mb) or a lot of jar
> files (see below). With all jars in the binary package, the new size (for
> tgz) is 45 Mb (JMeter 2.8 tgz : 23Mb)
>
> The question: are you agree to add Tika (and new capability to "extract text
> from Document") in JMeter with the new binary size?
>
> Secondary question: what the good way? : 1/ Add only tika-app.jar (which
> include all dependencies) [2], or 2/ Add several jar files (tika-core,
> tika-parser, etc + dependencies) [3]

I'm concerned that using Tika would double the size of JMeter.
Although the extra features would be useful, I suspect that most test
cases won't need the extra functionality.

Would it be possible to make the Tika jars optional?
i.e. add the functionality, but if the jars are not present it is disabled.

If we accept that developers must download Tika, then it should be
easy enough to structure the add-on so that JMeter can fail gracefully
if the jars are missing.
But ideally developers would not need to download all the jars either.

Mime
View raw message