jmeter-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milamber <milam...@apache.org>
Subject Re: Add Apache Tika in JMeter to extract text from various file type
Date Sun, 04 Nov 2012 13:09:31 GMT


Le 03/11/2012 23:47, Shmuel Krakower a ecrit :
> Hi Philippe
> if you concern about performance with the assertion. Maybe as a starter it
> would be better to begin with a separte assertion component for documents?
> Only this one will use Tika and will make it harder for users to mis-use
> Tika when not needed.

There haven't performance issues when JMeter runs a test if the radio 
button "Body as a Document" not selected (in Response Assertion or 
RegExp extractor)

Add several new elements (Tika regexp, Tika Assertion) for only one new 
radio button in "Response Field to check" section (in current Regexp 
extractor and Response Assertion) seems not necessary.
I can add some tooltip on the "Body as a Document" radio button + some 
warning in component reference. Seems a good compromise?

Milamber


>
> Regarding size of JMeter, it shouldn't be a concern.
>
> Overall sounds like a nice upgrade to JMeter capabilities.
>
> Best,
> Shmuel.
> בתאריך 2012 11 3 22:11, מאת "Philippe Mouawad"<philippe.mouawad@gmail.com>:
>
>> Hello Milamber,
>> My answers below.
>>
>> Regards
>> Philippe
>>
>> On Sat, Nov 3, 2012 at 8:23 PM, Milamber<milamber@apache.org>  wrote:
>>
>>> Hello,
>>>
>>> Currently, I work to add Apache Tika 1.2 [1] in JMeter to improve
>>> functional tests.
>>>
>>
>>> With Tika, you can extract the text form various documents, like MS
>> Office
>>> (Word, Excel, PowerPoint 97-2003, 2007-2010 (openxml), OpenOffice
>> (writer,
>>> calc, impress), HTML, Gz, jar/zip files (list of content), and some
>>> "multimedia" files like mp3, mp4, flv, etc.
>>>
>>> In JMeter, Tika can be used by the View Results Tree to view the text
>> data
>>> of this files, Regular extractor to catch some text from this files and
>>> Response assertion to assert on the data.
>>>
>>> The inconvenient is: Apache Tika requires a big jar (25Mb) or a lot of
>> jar
>>> files (see below). With all jars in the binary package, the new size (for
>>> tgz) is 45 Mb (JMeter 2.8 tgz : 23Mb)
>>>
>>> The question: are you agree to add Tika (and new capability to "extract
>>> text from Document") in JMeter with the new binary size?
>>>
>> I agree but we should check impact on JMeter performance  and if it's
>> important warn clearly about it and when to use it.
>>
>>
>>> Secondary question: what the good way? : 1/ Add only tika-app.jar (which
>>> include all dependencies) [2], or 2/ Add several jar files (tika-core,
>>> tika-parser, etc + dependencies) [3]
>>>
>>> I would see:
>>     - Tika (core + modules) if available
>>     - Dependencies
>>
>> But  not Tika+dependencies in one JAR.
>> If first option not possible, reference all tika modules + dependencies
>>
>>
>>> Milamber
>>>
>>>
>>> [1] http://tika.apache.org/
>>>
>>> [2] One Jar :
>>> +tika-app.version                = 1.2
>>> +tika-app.jar                    = tika-app-${tika-app.version}.**jar
>>> +tika-app.loc                    = ${maven2.repo}/org/apache/**
>>> tika/tika-app/${tika-app.**version}
>>> +tika-app.md5                    = e0ec70c80a6f3b113d8ac1c12a3333**8f
>>>
>>> [3] Several Jars (i must check if jar is missing)
>>>
>>> +tika-core.version                = 1.2
>>> +tika-core.jar                    = tika-core-${tika-core.version}**.jar
>>> +tika-core.loc                    = ${maven2.repo}/org/apache/**
>>> tika/tika-core/${tika-core.**version}
>>> +tika-core.md5                    = 17cfec5a9b28b323375de0692ce5ec**b1
>>> +
>>> +tika-parsers.version                = 1.2
>>> +tika-parsers.jar                    = tika-parsers-${tika-parsers.**
>>> version}.jar
>>> +tika-parsers.loc                    = ${maven2.repo}/org/apache/**
>>> tika/tika-parsers/${tika-**parsers.version}
>>> +tika-parsers.md5                    = a15b071726358fd195d5c4b0625cdf**b5
>>> +
>>> +
>>> +tika-parsers.version                = 1.2
>>> +tika-parsers.jar                    = tika-parsers-${tika-parsers.**
>>> version}.jar
>>> +tika-parsers.loc                    = ${maven2.repo}/org/apache/**
>>> tika/tika-parsers/${tika-**parsers.version}
>>> +tika-parsers.md5                    = a15b071726358fd195d5c4b0625cdf**b5
>>> +
>>> +netcdf.version                = 4.2-min
>>> +netcdf.jar                    = netcdf-${netcdf.version}.jar
>>> +netcdf.loc                    = ${maven2.repo}/edu/ucar/**
>>> netcdf/${netcdf.version}
>>> +netcdf.md5                    = eb00b40b0511f0fc1dfcfc9cb89e3c**53
>>> +
>>> +apache-mime4j-core.version                = 0.7.2
>>> +apache-mime4j-core.jar                    =
>> apache-mime4j-core-${apache-*
>>> *mime4j-core.version}.jar
>>> +apache-mime4j-core.loc                    = ${maven2.repo}/org/apache/**
>>> james/apache-mime4j-core/${**apache-mime4j-core.version}
>>> +apache-mime4j-core.md5                    =
>> 88f799546eca803c53eee01a4ce5ed
>>> **cd
>>> +
>>> +apache-mime4j-dom.version                = 0.7.2
>>> +apache-mime4j-dom.jar                    = apache-mime4j-dom-${apache-**
>>> mime4j-dom.version}.jar
>>> +apache-mime4j-dom.loc                    = ${maven2.repo}/org/apache/**
>>> james/apache-mime4j-dom/${**apache-mime4j-dom.version}
>>> +apache-mime4j-dom.md5                    =
>> dedc747b5c367fbd7f8a7235d1d7cb
>>> **ee
>>> +
>>> +commons-compress.version                = 1.4.1
>>> +commons-compress.jar                    = commons-compress-${commons-**
>>> compress.version}.jar
>>> +commons-compress.loc                    = ${maven2.repo}/org/apache/**
>>> commons/commons-compress/${**commons-compress.version}
>>> +commons-compress.md5                    =
>> 7f7ff9255a831325f38a170992b700*
>>> *73
>>> +
>>> +pdfbox.version                = 1.7.0
>>> +pdfbox.jar                    = pdfbox-${pdfbox.version}.jar
>>> +pdfbox.loc                    = ${maven2.repo}/org/apache/**
>>> pdfbox/pdfbox/${pdfbox.**version}
>>> +pdfbox.md5                    = da9ff2f1b43dc92b15fe3ba39a1cdd**cd
>>> +
>>> +fontbox.version                = 1.7.0
>>> +fontbox.jar                    = fontbox-${fontbox.version}.jar
>>> +fontbox.loc                    = ${maven2.repo}/org/apache/**
>>> pdfbox/fontbox/${fontbox.**version}
>>> +fontbox.md5                    = 9e03f94d92af257facb148c138af22**fa
>>> +
>>> +jempbox.version                = 1.7.0
>>> +jempbox.jar                    = jempbox-${jempbox.version}.jar
>>> +jempbox.loc                    = ${maven2.repo}/org/apache/**
>>> pdfbox/jempbox/${jempbox.**version}
>>> +jempbox.md5                    = 69dfbd6872c29f89a4df1179dd54b4**4e
>>> +
>>> +poi.version                = 3.8
>>> +poi.jar                    = poi-${poi.version}.jar
>>> +poi.loc                    = ${maven2.repo}/org/apache/poi/**
>>> poi/${poi.version}
>>> +poi.md5                    = 5c915f48922046c71121fd7021aa23**cb
>>> +
>>> +poi-scratchpad.version                = 3.8
>>> +poi-scratchpad.jar                    = poi-scratchpad-${poi-**
>>> scratchpad.version}.jar
>>> +poi-scratchpad.loc                    = ${maven2.repo}/org/apache/poi/**
>>> poi-scratchpad/${poi-**scratchpad.version}
>>> +poi-scratchpad.md5                    = 7427b6b9e53dcee57d382ba022efc3**
>>> be
>>> +
>>> +poi-ooxml.version                = 3.8
>>> +poi-ooxml.jar                    = poi-ooxml-${poi-ooxml.version}**.jar
>>> +poi-ooxml.loc                    = ${maven2.repo}/org/apache/poi/**
>>> poi-ooxml/${poi-ooxml.version}
>>> +poi-ooxml.md5                    = 8f147b248f078799c24c8714f185b1**a8
>>> +
>>> +geronimo-stax-api_1.0_spec.**version                = 1.0.1
>>> +geronimo-stax-api_1.0_spec.**jar                    =
>>> geronimo-stax-api_1.0_spec-${**geronimo-stax-api_1.0_spec.**version}.jar
>>> +geronimo-stax-api_1.0_spec.**loc                    =
>>> ${maven2.repo}/org/apache/**geronimo/specs/geronimo-stax-**
>>> api_1.0_spec/${geronimo-stax-**api_1.0_spec.version}
>>> +geronimo-stax-api_1.0_spec.**md5                    =
>>> b7c2a715cd3d1c43dc4ccfae426e8e**2e
>>> +
>>> +tagsoup.version                = 1.2.1
>>> +tagsoup.jar                    = tagsoup-${tagsoup.version}.jar
>>> +tagsoup.loc                    = ${maven2.repo}/org/ccil/cowan/**
>>> tagsoup/tagsoup/${tagsoup.**version}
>>> +tagsoup.md5                    = ae73a52cdcbec10cd61d9ef22fab59**36
>>> +
>>> +asm.version                = 3.1
>>> +asm.jar                    = asm-${asm.version}.jar
>>> +asm.loc                    = ${maven2.repo}/org/ow2/util/**
>>> asm/asm/${asm.version}
>>> +asm.md5                    = b1a36e247bf18fb4da46ce3a54627d**1b
>>> +
>>> +isoparser.version                = 1.0-RC-1
>>> +isoparser.jar                    = isoparser-${isoparser.version}**.jar
>>> +isoparser.loc                    = ${maven2.repo}/com/googlecode/**
>>> mp4parser/isoparser/${**isoparser.version}
>>> +isoparser.md5                    = b0444fde2290319c9028564c3c3ff1**ab
>>> +
>>> +metadata-extractor.version                = 2.4.0-beta-1
>>> +metadata-extractor.jar                    =
>> metadata-extractor-${metadata-
>>> **extractor.version}.jar
>>> +metadata-extractor.loc                    =
>> ${maven2.repo}/com/drewnoakes/
>>> **metadata-extractor/${metadata-**extractor.version}
>>> +metadata-extractor.md5                    =
>> 6e0ad2f0fe78047cb34ec056b39633
>>> **d3
>>> +
>>> +boilerpipe.version                = 1.1.0
>>> +boilerpipe.jar                    = boilerpipe-${boilerpipe.**
>>> version}.jar
>>> +boilerpipe.loc                    = ${maven2.repo}/de/l3s/**
>>> boilerpipe/boilerpipe/${**boilerpipe.version}
>>> +boilerpipe.md5                    = 0616568083786d0f49e2cb07a5d09f**e4
>>> +
>>> +rome.version                = 0.9
>>> +rome.jar                    = rome-${rome.version}.jar
>>> +rome.loc                    = ${maven2.repo}/rome/rome/${**rome.version}
>>> +rome.md5                    = 19589699b01c59ccb4d5e61e4c78b3**11
>>> +
>>> +vorbis-java-core.version                = 0.1
>>> +vorbis-java-core.jar                    = vorbis-java-core-${vorbis-**
>>> java-core.version}.jar
>>> +vorbis-java-core.loc                    =
>> ${maven2.repo}/org/gagravarr/**
>>> vorbis-java-core/${vorbis-**java-core.version}
>>> +vorbis-java-core.md5                    =
>> b88115be2754cb6883e652ba68ca46*
>>> *c8
>>> +
>>> +juniversalchardet.version                = 1.0.3
>>> +juniversalchardet.jar                    = juniversalchardet-${**
>>> juniversalchardet.version}.jar
>>> +juniversalchardet.loc                    =
>> ${maven2.repo}/com/googlecode/
>>> **juniversalchardet/**juniversalchardet/${**juniversalchardet.version}
>>> +juniversalchardet.md5                    =
>> d9ea0a9a275336c175b343f2e4cd8f
>>> **27
>>> +
>>> +xz.version                = 1.1
>>> +xz.jar                    = xz-${xz.version}.jar
>>> +xz.loc                    =
>> ${maven2.repo}/org/tukaani/xz/**${xz.version}
>>> +xz.md5                    = 4d0ba9643c8f3f7c6721be3a1286da**1c
>>> +
>>> +dom4j.version                 = 1.6.1
>>> +dom4j.jar                = dom4j-${dom4j.version}.jar
>>> +dom4j.loc                = ${maven2.repo}/dom4j/dom4j/${**dom4j.version}
>>> +dom4j.md5                = 4d8f51d3fe3900efc6e395be48030d**6d
>>> +
>>> +xmlbeans.version                 = 2.6.0
>>> +xmlbeans.jar                = xmlbeans-${xmlbeans.version}.**jar
>>> +xmlbeans.loc                = ${maven2.repo}/org/apache/**
>>> xmlbeans/xmlbeans/${xmlbeans.**version}
>>> +xmlbeans.md5                = 6591c08682d613194dacb01e95c78c**2c
>>> +
>>> +poi-ooxml.version                 = 3.8
>>> +poi-ooxml.jar                = poi-ooxml-${poi-ooxml.version}**.jar
>>> +poi-ooxml.loc                = ${maven2.repo}/org/apache/poi/**
>>> poi-ooxml/${poi-ooxml.version}
>>> +poi-ooxml.md5                = 8f147b248f078799c24c8714f185b1**a8
>>> +
>>> +poi-ooxml-schemas.version                 = 3.8
>>> +poi-ooxml-schemas.jar                = poi-ooxml-schemas-${poi-ooxml-**
>>> schemas.version}.jar
>>> +poi-ooxml-schemas.loc                = ${maven2.repo}/org/apache/poi/**
>>> poi-ooxml-schemas/${poi-ooxml-**schemas.version}
>>> +poi-ooxml-schemas.md5                =
>> 7ebcffdc4d82b2b8cbc6464d4543cd**07
>>>
>>>
>>>
>>
>> --
>> Cordialement.
>> Philippe Mouawad.
>>



Mime
View raw message