jmeter-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milamber <milam...@apache.org>
Subject Re: Add Apache Tika in JMeter to extract text from various file type
Date Sat, 03 Nov 2012 20:41:01 GMT


Le 03/11/2012 20:10, Philippe Mouawad a ecrit :
> Hello Milamber,
> My answers below.
>
> Regards
> Philippe
>
> On Sat, Nov 3, 2012 at 8:23 PM, Milamber<milamber@apache.org>  wrote:
>
>> Hello,
>>
>> Currently, I work to add Apache Tika 1.2 [1] in JMeter to improve
>> functional tests.
>>
>
>> With Tika, you can extract the text form various documents, like MS Office
>> (Word, Excel, PowerPoint 97-2003, 2007-2010 (openxml), OpenOffice (writer,
>> calc, impress), HTML, Gz, jar/zip files (list of content), and some
>> "multimedia" files like mp3, mp4, flv, etc.
>>
>> In JMeter, Tika can be used by the View Results Tree to view the text data
>> of this files, Regular extractor to catch some text from this files and
>> Response assertion to assert on the data.
>>
>> The inconvenient is: Apache Tika requires a big jar (25Mb) or a lot of jar
>> files (see below). With all jars in the binary package, the new size (for
>> tgz) is 45 Mb (JMeter 2.8 tgz : 23Mb)
>>
>> The question: are you agree to add Tika (and new capability to "extract
>> text from Document") in JMeter with the new binary size?
>>
> I agree but we should check impact on JMeter performance  and if it's
> important warn clearly about it and when to use it.

The performance impacts will be only when the option "Body as a 
Document" in Response Assertion or RegExp element is selected.
In View Results Tree, only when the Viewer "Document" is selected (and 
the VRT isn't recommended for load test)

If we want make a load test (not a functional test), we can add some 
sentences in docs/wiki to recommend to avoid to use the "Body as 
Document" option.


>
>
>> Secondary question: what the good way? : 1/ Add only tika-app.jar (which
>> include all dependencies) [2], or 2/ Add several jar files (tika-core,
>> tika-parser, etc + dependencies) [3]
>>
>> I would see:
>     - Tika (core + modules) if available
>     - Dependencies
>
> But  not Tika+dependencies in one JAR.
> If first option not possible, reference all tika modules + dependencies

It's possible. tika-(core|parsers) + list of dependencies below.



>
>
>> Milamber
>>
>>
>> [1] http://tika.apache.org/
>>
>> [2] One Jar :
>> +tika-app.version                = 1.2
>> +tika-app.jar                    = tika-app-${tika-app.version}.**jar
>> +tika-app.loc                    = ${maven2.repo}/org/apache/**
>> tika/tika-app/${tika-app.**version}
>> +tika-app.md5                    = e0ec70c80a6f3b113d8ac1c12a3333**8f
>>
>> [3] Several Jars (i must check if jar is missing)
>>
>> +tika-core.version                = 1.2
>> +tika-core.jar                    = tika-core-${tika-core.version}**.jar
>> +tika-core.loc                    = ${maven2.repo}/org/apache/**
>> tika/tika-core/${tika-core.**version}
>> +tika-core.md5                    = 17cfec5a9b28b323375de0692ce5ec**b1
>> +
>> +tika-parsers.version                = 1.2
>> +tika-parsers.jar                    = tika-parsers-${tika-parsers.**
>> version}.jar
>> +tika-parsers.loc                    = ${maven2.repo}/org/apache/**
>> tika/tika-parsers/${tika-**parsers.version}
>> +tika-parsers.md5                    = a15b071726358fd195d5c4b0625cdf**b5
>> +
>> +
>> +tika-parsers.version                = 1.2
>> +tika-parsers.jar                    = tika-parsers-${tika-parsers.**
>> version}.jar
>> +tika-parsers.loc                    = ${maven2.repo}/org/apache/**
>> tika/tika-parsers/${tika-**parsers.version}
>> +tika-parsers.md5                    = a15b071726358fd195d5c4b0625cdf**b5
>> +
>> +netcdf.version                = 4.2-min
>> +netcdf.jar                    = netcdf-${netcdf.version}.jar
>> +netcdf.loc                    = ${maven2.repo}/edu/ucar/**
>> netcdf/${netcdf.version}
>> +netcdf.md5                    = eb00b40b0511f0fc1dfcfc9cb89e3c**53
>> +
>> +apache-mime4j-core.version                = 0.7.2
>> +apache-mime4j-core.jar                    = apache-mime4j-core-${apache-*
>> *mime4j-core.version}.jar
>> +apache-mime4j-core.loc                    = ${maven2.repo}/org/apache/**
>> james/apache-mime4j-core/${**apache-mime4j-core.version}
>> +apache-mime4j-core.md5                    = 88f799546eca803c53eee01a4ce5ed
>> **cd
>> +
>> +apache-mime4j-dom.version                = 0.7.2
>> +apache-mime4j-dom.jar                    = apache-mime4j-dom-${apache-**
>> mime4j-dom.version}.jar
>> +apache-mime4j-dom.loc                    = ${maven2.repo}/org/apache/**
>> james/apache-mime4j-dom/${**apache-mime4j-dom.version}
>> +apache-mime4j-dom.md5                    = dedc747b5c367fbd7f8a7235d1d7cb
>> **ee
>> +
>> +commons-compress.version                = 1.4.1
>> +commons-compress.jar                    = commons-compress-${commons-**
>> compress.version}.jar
>> +commons-compress.loc                    = ${maven2.repo}/org/apache/**
>> commons/commons-compress/${**commons-compress.version}
>> +commons-compress.md5                    = 7f7ff9255a831325f38a170992b700*
>> *73
>> +
>> +pdfbox.version                = 1.7.0
>> +pdfbox.jar                    = pdfbox-${pdfbox.version}.jar
>> +pdfbox.loc                    = ${maven2.repo}/org/apache/**
>> pdfbox/pdfbox/${pdfbox.**version}
>> +pdfbox.md5                    = da9ff2f1b43dc92b15fe3ba39a1cdd**cd
>> +
>> +fontbox.version                = 1.7.0
>> +fontbox.jar                    = fontbox-${fontbox.version}.jar
>> +fontbox.loc                    = ${maven2.repo}/org/apache/**
>> pdfbox/fontbox/${fontbox.**version}
>> +fontbox.md5                    = 9e03f94d92af257facb148c138af22**fa
>> +
>> +jempbox.version                = 1.7.0
>> +jempbox.jar                    = jempbox-${jempbox.version}.jar
>> +jempbox.loc                    = ${maven2.repo}/org/apache/**
>> pdfbox/jempbox/${jempbox.**version}
>> +jempbox.md5                    = 69dfbd6872c29f89a4df1179dd54b4**4e
>> +
>> +poi.version                = 3.8
>> +poi.jar                    = poi-${poi.version}.jar
>> +poi.loc                    = ${maven2.repo}/org/apache/poi/**
>> poi/${poi.version}
>> +poi.md5                    = 5c915f48922046c71121fd7021aa23**cb
>> +
>> +poi-scratchpad.version                = 3.8
>> +poi-scratchpad.jar                    = poi-scratchpad-${poi-**
>> scratchpad.version}.jar
>> +poi-scratchpad.loc                    = ${maven2.repo}/org/apache/poi/**
>> poi-scratchpad/${poi-**scratchpad.version}
>> +poi-scratchpad.md5                    = 7427b6b9e53dcee57d382ba022efc3**
>> be
>> +
>> +poi-ooxml.version                = 3.8
>> +poi-ooxml.jar                    = poi-ooxml-${poi-ooxml.version}**.jar
>> +poi-ooxml.loc                    = ${maven2.repo}/org/apache/poi/**
>> poi-ooxml/${poi-ooxml.version}
>> +poi-ooxml.md5                    = 8f147b248f078799c24c8714f185b1**a8
>> +
>> +geronimo-stax-api_1.0_spec.**version                = 1.0.1
>> +geronimo-stax-api_1.0_spec.**jar                    =
>> geronimo-stax-api_1.0_spec-${**geronimo-stax-api_1.0_spec.**version}.jar
>> +geronimo-stax-api_1.0_spec.**loc                    =
>> ${maven2.repo}/org/apache/**geronimo/specs/geronimo-stax-**
>> api_1.0_spec/${geronimo-stax-**api_1.0_spec.version}
>> +geronimo-stax-api_1.0_spec.**md5                    =
>> b7c2a715cd3d1c43dc4ccfae426e8e**2e
>> +
>> +tagsoup.version                = 1.2.1
>> +tagsoup.jar                    = tagsoup-${tagsoup.version}.jar
>> +tagsoup.loc                    = ${maven2.repo}/org/ccil/cowan/**
>> tagsoup/tagsoup/${tagsoup.**version}
>> +tagsoup.md5                    = ae73a52cdcbec10cd61d9ef22fab59**36
>> +
>> +asm.version                = 3.1
>> +asm.jar                    = asm-${asm.version}.jar
>> +asm.loc                    = ${maven2.repo}/org/ow2/util/**
>> asm/asm/${asm.version}
>> +asm.md5                    = b1a36e247bf18fb4da46ce3a54627d**1b
>> +
>> +isoparser.version                = 1.0-RC-1
>> +isoparser.jar                    = isoparser-${isoparser.version}**.jar
>> +isoparser.loc                    = ${maven2.repo}/com/googlecode/**
>> mp4parser/isoparser/${**isoparser.version}
>> +isoparser.md5                    = b0444fde2290319c9028564c3c3ff1**ab
>> +
>> +metadata-extractor.version                = 2.4.0-beta-1
>> +metadata-extractor.jar                    = metadata-extractor-${metadata-
>> **extractor.version}.jar
>> +metadata-extractor.loc                    = ${maven2.repo}/com/drewnoakes/
>> **metadata-extractor/${metadata-**extractor.version}
>> +metadata-extractor.md5                    = 6e0ad2f0fe78047cb34ec056b39633
>> **d3
>> +
>> +boilerpipe.version                = 1.1.0
>> +boilerpipe.jar                    = boilerpipe-${boilerpipe.**
>> version}.jar
>> +boilerpipe.loc                    = ${maven2.repo}/de/l3s/**
>> boilerpipe/boilerpipe/${**boilerpipe.version}
>> +boilerpipe.md5                    = 0616568083786d0f49e2cb07a5d09f**e4
>> +
>> +rome.version                = 0.9
>> +rome.jar                    = rome-${rome.version}.jar
>> +rome.loc                    = ${maven2.repo}/rome/rome/${**rome.version}
>> +rome.md5                    = 19589699b01c59ccb4d5e61e4c78b3**11
>> +
>> +vorbis-java-core.version                = 0.1
>> +vorbis-java-core.jar                    = vorbis-java-core-${vorbis-**
>> java-core.version}.jar
>> +vorbis-java-core.loc                    = ${maven2.repo}/org/gagravarr/**
>> vorbis-java-core/${vorbis-**java-core.version}
>> +vorbis-java-core.md5                    = b88115be2754cb6883e652ba68ca46*
>> *c8
>> +
>> +juniversalchardet.version                = 1.0.3
>> +juniversalchardet.jar                    = juniversalchardet-${**
>> juniversalchardet.version}.jar
>> +juniversalchardet.loc                    = ${maven2.repo}/com/googlecode/
>> **juniversalchardet/**juniversalchardet/${**juniversalchardet.version}
>> +juniversalchardet.md5                    = d9ea0a9a275336c175b343f2e4cd8f
>> **27
>> +
>> +xz.version                = 1.1
>> +xz.jar                    = xz-${xz.version}.jar
>> +xz.loc                    = ${maven2.repo}/org/tukaani/xz/**${xz.version}
>> +xz.md5                    = 4d0ba9643c8f3f7c6721be3a1286da**1c
>> +
>> +dom4j.version                 = 1.6.1
>> +dom4j.jar                = dom4j-${dom4j.version}.jar
>> +dom4j.loc                = ${maven2.repo}/dom4j/dom4j/${**dom4j.version}
>> +dom4j.md5                = 4d8f51d3fe3900efc6e395be48030d**6d
>> +
>> +xmlbeans.version                 = 2.6.0
>> +xmlbeans.jar                = xmlbeans-${xmlbeans.version}.**jar
>> +xmlbeans.loc                = ${maven2.repo}/org/apache/**
>> xmlbeans/xmlbeans/${xmlbeans.**version}
>> +xmlbeans.md5                = 6591c08682d613194dacb01e95c78c**2c
>> +
>> +poi-ooxml.version                 = 3.8
>> +poi-ooxml.jar                = poi-ooxml-${poi-ooxml.version}**.jar
>> +poi-ooxml.loc                = ${maven2.repo}/org/apache/poi/**
>> poi-ooxml/${poi-ooxml.version}
>> +poi-ooxml.md5                = 8f147b248f078799c24c8714f185b1**a8
>> +
>> +poi-ooxml-schemas.version                 = 3.8
>> +poi-ooxml-schemas.jar                = poi-ooxml-schemas-${poi-ooxml-**
>> schemas.version}.jar
>> +poi-ooxml-schemas.loc                = ${maven2.repo}/org/apache/poi/**
>> poi-ooxml-schemas/${poi-ooxml-**schemas.version}
>> +poi-ooxml-schemas.md5                = 7ebcffdc4d82b2b8cbc6464d4543cd**07
>>
>>
>>
>>
>


Mime
View raw message