jmeter-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philippe Mouawad <philippe.moua...@gmail.com>
Subject Re: Add Apache Tika in JMeter to extract text from various file type
Date Sun, 04 Nov 2012 13:17:20 GMT
On Sun, Nov 4, 2012 at 2:09 PM, Milamber <milamber@apache.org> wrote:

>
>
> Le 03/11/2012 23:47, Shmuel Krakower a ecrit :
>
>  Hi Philippe
>> if you concern about performance with the assertion. Maybe as a starter it
>> would be better to begin with a separte assertion component for documents?
>> Only this one will use Tika and will make it harder for users to mis-use
>> Tika when not needed.
>>
>
> There haven't performance issues when JMeter runs a test if the radio
> button "Body as a Document" not selected (in Response Assertion or RegExp
> extractor)
>
> Add several new elements (Tika regexp, Tika Assertion) for only one new
> radio button in "Response Field to check" section (in current Regexp
> extractor and Response Assertion) seems not necessary.
> I can add some tooltip on the "Body as a Document" radio button + some
> warning in component reference. Seems a good compromise?
>
> +1 for me.

> Milamber
>
>
>
>
>> Regarding size of JMeter, it shouldn't be a concern.
>>
>> Overall sounds like a nice upgrade to JMeter capabilities.
>>
>> Best,
>> Shmuel.
>> בתאריך 2012 11 3 22:11, מאת "Philippe Mouawad"<philippe.mouawad@**
>> gmail.com <philippe.mouawad@gmail.com>>:
>>
>>  Hello Milamber,
>>> My answers below.
>>>
>>> Regards
>>> Philippe
>>>
>>> On Sat, Nov 3, 2012 at 8:23 PM, Milamber<milamber@apache.org>  wrote:
>>>
>>>  Hello,
>>>>
>>>> Currently, I work to add Apache Tika 1.2 [1] in JMeter to improve
>>>> functional tests.
>>>>
>>>>
>>>  With Tika, you can extract the text form various documents, like MS
>>>>
>>> Office
>>>
>>>> (Word, Excel, PowerPoint 97-2003, 2007-2010 (openxml), OpenOffice
>>>>
>>> (writer,
>>>
>>>> calc, impress), HTML, Gz, jar/zip files (list of content), and some
>>>> "multimedia" files like mp3, mp4, flv, etc.
>>>>
>>>> In JMeter, Tika can be used by the View Results Tree to view the text
>>>>
>>> data
>>>
>>>> of this files, Regular extractor to catch some text from this files and
>>>> Response assertion to assert on the data.
>>>>
>>>> The inconvenient is: Apache Tika requires a big jar (25Mb) or a lot of
>>>>
>>> jar
>>>
>>>> files (see below). With all jars in the binary package, the new size
>>>> (for
>>>> tgz) is 45 Mb (JMeter 2.8 tgz : 23Mb)
>>>>
>>>> The question: are you agree to add Tika (and new capability to "extract
>>>> text from Document") in JMeter with the new binary size?
>>>>
>>>>  I agree but we should check impact on JMeter performance  and if it's
>>> important warn clearly about it and when to use it.
>>>
>>>
>>>  Secondary question: what the good way? : 1/ Add only tika-app.jar (which
>>>> include all dependencies) [2], or 2/ Add several jar files (tika-core,
>>>> tika-parser, etc + dependencies) [3]
>>>>
>>>> I would see:
>>>>
>>>     - Tika (core + modules) if available
>>>     - Dependencies
>>>
>>> But  not Tika+dependencies in one JAR.
>>> If first option not possible, reference all tika modules + dependencies
>>>
>>>
>>>  Milamber
>>>>
>>>>
>>>> [1] http://tika.apache.org/
>>>>
>>>> [2] One Jar :
>>>> +tika-app.version                = 1.2
>>>> +tika-app.jar                    = tika-app-${tika-app.version}.****jar
>>>> +tika-app.loc                    = ${maven2.repo}/org/apache/**
>>>> tika/tika-app/${tika-app.****version}
>>>> +tika-app.md5                    = e0ec70c80a6f3b113d8ac1c12a3333****8f
>>>>
>>>> [3] Several Jars (i must check if jar is missing)
>>>>
>>>> +tika-core.version                = 1.2
>>>> +tika-core.jar                    = tika-core-${tika-core.version}**
>>>> **.jar
>>>> +tika-core.loc                    = ${maven2.repo}/org/apache/**
>>>> tika/tika-core/${tika-core.****version}
>>>> +tika-core.md5                    = 17cfec5a9b28b323375de0692ce5ec**
>>>> **b1
>>>> +
>>>> +tika-parsers.version                = 1.2
>>>> +tika-parsers.jar                    = tika-parsers-${tika-parsers.**
>>>> version}.jar
>>>> +tika-parsers.loc                    = ${maven2.repo}/org/apache/**
>>>> tika/tika-parsers/${tika-****parsers.version}
>>>> +tika-parsers.md5                    = a15b071726358fd195d5c4b0625cdf**
>>>> **b5
>>>> +
>>>> +
>>>> +tika-parsers.version                = 1.2
>>>> +tika-parsers.jar                    = tika-parsers-${tika-parsers.**
>>>> version}.jar
>>>> +tika-parsers.loc                    = ${maven2.repo}/org/apache/**
>>>> tika/tika-parsers/${tika-****parsers.version}
>>>> +tika-parsers.md5                    = a15b071726358fd195d5c4b0625cdf**
>>>> **b5
>>>> +
>>>> +netcdf.version                = 4.2-min
>>>> +netcdf.jar                    = netcdf-${netcdf.version}.jar
>>>> +netcdf.loc                    = ${maven2.repo}/edu/ucar/**
>>>> netcdf/${netcdf.version}
>>>> +netcdf.md5                    = eb00b40b0511f0fc1dfcfc9cb89e3c****53
>>>> +
>>>> +apache-mime4j-core.version                = 0.7.2
>>>> +apache-mime4j-core.jar                    =
>>>>
>>> apache-mime4j-core-${apache-*
>>>
>>>> *mime4j-core.version}.jar
>>>> +apache-mime4j-core.loc                    =
>>>> ${maven2.repo}/org/apache/**
>>>> james/apache-mime4j-core/${****apache-mime4j-core.version}
>>>> +apache-mime4j-core.md5                    =
>>>>
>>> 88f799546eca803c53eee01a4ce5ed
>>>
>>>> **cd
>>>> +
>>>> +apache-mime4j-dom.version                = 0.7.2
>>>> +apache-mime4j-dom.jar                    =
>>>> apache-mime4j-dom-${apache-**
>>>> mime4j-dom.version}.jar
>>>> +apache-mime4j-dom.loc                    = ${maven2.repo}/org/apache/**
>>>> james/apache-mime4j-dom/${****apache-mime4j-dom.version}
>>>> +apache-mime4j-dom.md5                    =
>>>>
>>> dedc747b5c367fbd7f8a7235d1d7cb
>>>
>>>> **ee
>>>> +
>>>> +commons-compress.version                = 1.4.1
>>>> +commons-compress.jar                    = commons-compress-${commons-**
>>>> compress.version}.jar
>>>> +commons-compress.loc                    = ${maven2.repo}/org/apache/**
>>>> commons/commons-compress/${****commons-compress.version}
>>>> +commons-compress.md5                    =
>>>>
>>> 7f7ff9255a831325f38a170992b700***
>>>
>>>> *73
>>>> +
>>>> +pdfbox.version                = 1.7.0
>>>> +pdfbox.jar                    = pdfbox-${pdfbox.version}.jar
>>>> +pdfbox.loc                    = ${maven2.repo}/org/apache/**
>>>> pdfbox/pdfbox/${pdfbox.****version}
>>>> +pdfbox.md5                    = da9ff2f1b43dc92b15fe3ba39a1cdd****cd
>>>> +
>>>> +fontbox.version                = 1.7.0
>>>> +fontbox.jar                    = fontbox-${fontbox.version}.jar
>>>> +fontbox.loc                    = ${maven2.repo}/org/apache/**
>>>> pdfbox/fontbox/${fontbox.****version}
>>>> +fontbox.md5                    = 9e03f94d92af257facb148c138af22****fa
>>>> +
>>>> +jempbox.version                = 1.7.0
>>>> +jempbox.jar                    = jempbox-${jempbox.version}.jar
>>>> +jempbox.loc                    = ${maven2.repo}/org/apache/**
>>>> pdfbox/jempbox/${jempbox.****version}
>>>> +jempbox.md5                    = 69dfbd6872c29f89a4df1179dd54b4****4e
>>>> +
>>>> +poi.version                = 3.8
>>>> +poi.jar                    = poi-${poi.version}.jar
>>>> +poi.loc                    = ${maven2.repo}/org/apache/poi/****
>>>> poi/${poi.version}
>>>> +poi.md5                    = 5c915f48922046c71121fd7021aa23****cb
>>>> +
>>>> +poi-scratchpad.version                = 3.8
>>>> +poi-scratchpad.jar                    = poi-scratchpad-${poi-**
>>>> scratchpad.version}.jar
>>>> +poi-scratchpad.loc                    = ${maven2.repo}/org/apache/poi/
>>>> ****
>>>> poi-scratchpad/${poi-****scratchpad.version}
>>>> +poi-scratchpad.md5                    = 7427b6b9e53dcee57d382ba022efc3
>>>> ****
>>>> be
>>>> +
>>>> +poi-ooxml.version                = 3.8
>>>> +poi-ooxml.jar                    = poi-ooxml-${poi-ooxml.version}**
>>>> **.jar
>>>> +poi-ooxml.loc                    = ${maven2.repo}/org/apache/poi/****
>>>> poi-ooxml/${poi-ooxml.version}
>>>> +poi-ooxml.md5                    = 8f147b248f078799c24c8714f185b1**
>>>> **a8
>>>> +
>>>> +geronimo-stax-api_1.0_spec.****version                = 1.0.1
>>>> +geronimo-stax-api_1.0_spec.****jar                    =
>>>> geronimo-stax-api_1.0_spec-${****geronimo-stax-api_1.0_spec.****
>>>> version}.jar
>>>> +geronimo-stax-api_1.0_spec.****loc                    =
>>>> ${maven2.repo}/org/apache/****geronimo/specs/geronimo-stax-****
>>>> api_1.0_spec/${geronimo-stax-****api_1.0_spec.version}
>>>> +geronimo-stax-api_1.0_spec.****md5                    =
>>>> b7c2a715cd3d1c43dc4ccfae426e8e****2e
>>>> +
>>>> +tagsoup.version                = 1.2.1
>>>> +tagsoup.jar                    = tagsoup-${tagsoup.version}.jar
>>>> +tagsoup.loc                    = ${maven2.repo}/org/ccil/cowan/****
>>>> tagsoup/tagsoup/${tagsoup.****version}
>>>> +tagsoup.md5                    = ae73a52cdcbec10cd61d9ef22fab59****36
>>>> +
>>>> +asm.version                = 3.1
>>>> +asm.jar                    = asm-${asm.version}.jar
>>>> +asm.loc                    = ${maven2.repo}/org/ow2/util/**
>>>> asm/asm/${asm.version}
>>>> +asm.md5                    = b1a36e247bf18fb4da46ce3a54627d****1b
>>>> +
>>>> +isoparser.version                = 1.0-RC-1
>>>> +isoparser.jar                    = isoparser-${isoparser.version}**
>>>> **.jar
>>>> +isoparser.loc                    = ${maven2.repo}/com/googlecode/****
>>>> mp4parser/isoparser/${****isoparser.version}
>>>> +isoparser.md5                    = b0444fde2290319c9028564c3c3ff1**
>>>> **ab
>>>> +
>>>> +metadata-extractor.version                = 2.4.0-beta-1
>>>> +metadata-extractor.jar                    =
>>>>
>>> metadata-extractor-${metadata-
>>>
>>>> **extractor.version}.jar
>>>> +metadata-extractor.loc                    =
>>>>
>>> ${maven2.repo}/com/drewnoakes/
>>>
>>>> **metadata-extractor/${**metadata-**extractor.version}
>>>> +metadata-extractor.md5                    =
>>>>
>>> 6e0ad2f0fe78047cb34ec056b39633
>>>
>>>> **d3
>>>> +
>>>> +boilerpipe.version                = 1.1.0
>>>> +boilerpipe.jar                    = boilerpipe-${boilerpipe.**
>>>> version}.jar
>>>> +boilerpipe.loc                    = ${maven2.repo}/de/l3s/**
>>>> boilerpipe/boilerpipe/${****boilerpipe.version}
>>>> +boilerpipe.md5                    = 0616568083786d0f49e2cb07a5d09f**
>>>> **e4
>>>> +
>>>> +rome.version                = 0.9
>>>> +rome.jar                    = rome-${rome.version}.jar
>>>> +rome.loc                    = ${maven2.repo}/rome/rome/${****
>>>> rome.version}
>>>> +rome.md5                    = 19589699b01c59ccb4d5e61e4c78b3****11
>>>> +
>>>> +vorbis-java-core.version                = 0.1
>>>> +vorbis-java-core.jar                    = vorbis-java-core-${vorbis-**
>>>> java-core.version}.jar
>>>> +vorbis-java-core.loc                    =
>>>>
>>> ${maven2.repo}/org/gagravarr/****
>>>
>>>> vorbis-java-core/${vorbis-****java-core.version}
>>>> +vorbis-java-core.md5                    =
>>>>
>>> b88115be2754cb6883e652ba68ca46***
>>>
>>>> *c8
>>>> +
>>>> +juniversalchardet.version                = 1.0.3
>>>> +juniversalchardet.jar                    = juniversalchardet-${**
>>>> juniversalchardet.version}.jar
>>>> +juniversalchardet.loc                    =
>>>>
>>> ${maven2.repo}/com/googlecode/
>>>
>>>> **juniversalchardet/****juniversalchardet/${****
>>>> juniversalchardet.version}
>>>> +juniversalchardet.md5                    =
>>>>
>>> d9ea0a9a275336c175b343f2e4cd8f
>>>
>>>> **27
>>>> +
>>>> +xz.version                = 1.1
>>>> +xz.jar                    = xz-${xz.version}.jar
>>>> +xz.loc                    =
>>>>
>>> ${maven2.repo}/org/tukaani/xz/****${xz.version}
>>>
>>>> +xz.md5                    = 4d0ba9643c8f3f7c6721be3a1286da****1c
>>>> +
>>>> +dom4j.version                 = 1.6.1
>>>> +dom4j.jar                = dom4j-${dom4j.version}.jar
>>>> +dom4j.loc                = ${maven2.repo}/dom4j/dom4j/${***
>>>> *dom4j.version}
>>>> +dom4j.md5                = 4d8f51d3fe3900efc6e395be48030d****6d
>>>> +
>>>> +xmlbeans.version                 = 2.6.0
>>>> +xmlbeans.jar                = xmlbeans-${xmlbeans.version}.****jar
>>>> +xmlbeans.loc                = ${maven2.repo}/org/apache/**
>>>> xmlbeans/xmlbeans/${xmlbeans.****version}
>>>> +xmlbeans.md5                = 6591c08682d613194dacb01e95c78c****2c
>>>> +
>>>> +poi-ooxml.version                 = 3.8
>>>> +poi-ooxml.jar                = poi-ooxml-${poi-ooxml.version}****.jar
>>>> +poi-ooxml.loc                = ${maven2.repo}/org/apache/poi/****
>>>> poi-ooxml/${poi-ooxml.version}
>>>> +poi-ooxml.md5                = 8f147b248f078799c24c8714f185b1****a8
>>>> +
>>>> +poi-ooxml-schemas.version                 = 3.8
>>>> +poi-ooxml-schemas.jar                = poi-ooxml-schemas-${poi-ooxml-*
>>>> ***
>>>> schemas.version}.jar
>>>> +poi-ooxml-schemas.loc                = ${maven2.repo}/org/apache/poi/*
>>>> ***
>>>> poi-ooxml-schemas/${poi-ooxml-****schemas.version}
>>>> +poi-ooxml-schemas.md5                =
>>>>
>>> 7ebcffdc4d82b2b8cbc6464d4543cd****07
>>>
>>>>
>>>>
>>>>
>>>>
>>> --
>>> Cordialement.
>>> Philippe Mouawad.
>>>
>>>
>
>


-- 
Cordialement.
Philippe Mouawad.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message