jmeter-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milamber <milam...@apache.org>
Subject Add Apache Tika in JMeter to extract text from various file type
Date Sat, 03 Nov 2012 19:23:58 GMT
Hello,

Currently, I work to add Apache Tika 1.2 [1] in JMeter to improve 
functional tests.

With Tika, you can extract the text form various documents, like MS 
Office (Word, Excel, PowerPoint 97-2003, 2007-2010 (openxml), OpenOffice 
(writer, calc, impress), HTML, Gz, jar/zip files (list of content), and 
some "multimedia" files like mp3, mp4, flv, etc.

In JMeter, Tika can be used by the View Results Tree to view the text 
data of this files, Regular extractor to catch some text from this files 
and Response assertion to assert on the data.

The inconvenient is: Apache Tika requires a big jar (25Mb) or a lot of 
jar files (see below). With all jars in the binary package, the new size 
(for tgz) is 45 Mb (JMeter 2.8 tgz : 23Mb)

The question: are you agree to add Tika (and new capability to "extract 
text from Document") in JMeter with the new binary size?

Secondary question: what the good way? : 1/ Add only tika-app.jar (which 
include all dependencies) [2], or 2/ Add several jar files (tika-core, 
tika-parser, etc + dependencies) [3]

Milamber


[1] http://tika.apache.org/

[2] One Jar :
+tika-app.version                = 1.2
+tika-app.jar                    = tika-app-${tika-app.version}.jar
+tika-app.loc                    = 
${maven2.repo}/org/apache/tika/tika-app/${tika-app.version}
+tika-app.md5                    = e0ec70c80a6f3b113d8ac1c12a33338f

[3] Several Jars (i must check if jar is missing)

+tika-core.version                = 1.2
+tika-core.jar                    = tika-core-${tika-core.version}.jar
+tika-core.loc                    = 
${maven2.repo}/org/apache/tika/tika-core/${tika-core.version}
+tika-core.md5                    = 17cfec5a9b28b323375de0692ce5ecb1
+
+tika-parsers.version                = 1.2
+tika-parsers.jar                    = 
tika-parsers-${tika-parsers.version}.jar
+tika-parsers.loc                    = 
${maven2.repo}/org/apache/tika/tika-parsers/${tika-parsers.version}
+tika-parsers.md5                    = a15b071726358fd195d5c4b0625cdfb5
+
+
+tika-parsers.version                = 1.2
+tika-parsers.jar                    = 
tika-parsers-${tika-parsers.version}.jar
+tika-parsers.loc                    = 
${maven2.repo}/org/apache/tika/tika-parsers/${tika-parsers.version}
+tika-parsers.md5                    = a15b071726358fd195d5c4b0625cdfb5
+
+netcdf.version                = 4.2-min
+netcdf.jar                    = netcdf-${netcdf.version}.jar
+netcdf.loc                    = 
${maven2.repo}/edu/ucar/netcdf/${netcdf.version}
+netcdf.md5                    = eb00b40b0511f0fc1dfcfc9cb89e3c53
+
+apache-mime4j-core.version                = 0.7.2
+apache-mime4j-core.jar                    = 
apache-mime4j-core-${apache-mime4j-core.version}.jar
+apache-mime4j-core.loc                    = 
${maven2.repo}/org/apache/james/apache-mime4j-core/${apache-mime4j-core.version}
+apache-mime4j-core.md5                    = 
88f799546eca803c53eee01a4ce5edcd
+
+apache-mime4j-dom.version                = 0.7.2
+apache-mime4j-dom.jar                    = 
apache-mime4j-dom-${apache-mime4j-dom.version}.jar
+apache-mime4j-dom.loc                    = 
${maven2.repo}/org/apache/james/apache-mime4j-dom/${apache-mime4j-dom.version}
+apache-mime4j-dom.md5                    = dedc747b5c367fbd7f8a7235d1d7cbee
+
+commons-compress.version                = 1.4.1
+commons-compress.jar                    = 
commons-compress-${commons-compress.version}.jar
+commons-compress.loc                    = 
${maven2.repo}/org/apache/commons/commons-compress/${commons-compress.version}
+commons-compress.md5                    = 7f7ff9255a831325f38a170992b70073
+
+pdfbox.version                = 1.7.0
+pdfbox.jar                    = pdfbox-${pdfbox.version}.jar
+pdfbox.loc                    = 
${maven2.repo}/org/apache/pdfbox/pdfbox/${pdfbox.version}
+pdfbox.md5                    = da9ff2f1b43dc92b15fe3ba39a1cddcd
+
+fontbox.version                = 1.7.0
+fontbox.jar                    = fontbox-${fontbox.version}.jar
+fontbox.loc                    = 
${maven2.repo}/org/apache/pdfbox/fontbox/${fontbox.version}
+fontbox.md5                    = 9e03f94d92af257facb148c138af22fa
+
+jempbox.version                = 1.7.0
+jempbox.jar                    = jempbox-${jempbox.version}.jar
+jempbox.loc                    = 
${maven2.repo}/org/apache/pdfbox/jempbox/${jempbox.version}
+jempbox.md5                    = 69dfbd6872c29f89a4df1179dd54b44e
+
+poi.version                = 3.8
+poi.jar                    = poi-${poi.version}.jar
+poi.loc                    = 
${maven2.repo}/org/apache/poi/poi/${poi.version}
+poi.md5                    = 5c915f48922046c71121fd7021aa23cb
+
+poi-scratchpad.version                = 3.8
+poi-scratchpad.jar                    = 
poi-scratchpad-${poi-scratchpad.version}.jar
+poi-scratchpad.loc                    = 
${maven2.repo}/org/apache/poi/poi-scratchpad/${poi-scratchpad.version}
+poi-scratchpad.md5                    = 7427b6b9e53dcee57d382ba022efc3be
+
+poi-ooxml.version                = 3.8
+poi-ooxml.jar                    = poi-ooxml-${poi-ooxml.version}.jar
+poi-ooxml.loc                    = 
${maven2.repo}/org/apache/poi/poi-ooxml/${poi-ooxml.version}
+poi-ooxml.md5                    = 8f147b248f078799c24c8714f185b1a8
+
+geronimo-stax-api_1.0_spec.version                = 1.0.1
+geronimo-stax-api_1.0_spec.jar                    = 
geronimo-stax-api_1.0_spec-${geronimo-stax-api_1.0_spec.version}.jar
+geronimo-stax-api_1.0_spec.loc                    = 
${maven2.repo}/org/apache/geronimo/specs/geronimo-stax-api_1.0_spec/${geronimo-stax-api_1.0_spec.version}
+geronimo-stax-api_1.0_spec.md5                    = 
b7c2a715cd3d1c43dc4ccfae426e8e2e
+
+tagsoup.version                = 1.2.1
+tagsoup.jar                    = tagsoup-${tagsoup.version}.jar
+tagsoup.loc                    = 
${maven2.repo}/org/ccil/cowan/tagsoup/tagsoup/${tagsoup.version}
+tagsoup.md5                    = ae73a52cdcbec10cd61d9ef22fab5936
+
+asm.version                = 3.1
+asm.jar                    = asm-${asm.version}.jar
+asm.loc                    = 
${maven2.repo}/org/ow2/util/asm/asm/${asm.version}
+asm.md5                    = b1a36e247bf18fb4da46ce3a54627d1b
+
+isoparser.version                = 1.0-RC-1
+isoparser.jar                    = isoparser-${isoparser.version}.jar
+isoparser.loc                    = 
${maven2.repo}/com/googlecode/mp4parser/isoparser/${isoparser.version}
+isoparser.md5                    = b0444fde2290319c9028564c3c3ff1ab
+
+metadata-extractor.version                = 2.4.0-beta-1
+metadata-extractor.jar                    = 
metadata-extractor-${metadata-extractor.version}.jar
+metadata-extractor.loc                    = 
${maven2.repo}/com/drewnoakes/metadata-extractor/${metadata-extractor.version}
+metadata-extractor.md5                    = 
6e0ad2f0fe78047cb34ec056b39633d3
+
+boilerpipe.version                = 1.1.0
+boilerpipe.jar                    = boilerpipe-${boilerpipe.version}.jar
+boilerpipe.loc                    = 
${maven2.repo}/de/l3s/boilerpipe/boilerpipe/${boilerpipe.version}
+boilerpipe.md5                    = 0616568083786d0f49e2cb07a5d09fe4
+
+rome.version                = 0.9
+rome.jar                    = rome-${rome.version}.jar
+rome.loc                    = ${maven2.repo}/rome/rome/${rome.version}
+rome.md5                    = 19589699b01c59ccb4d5e61e4c78b311
+
+vorbis-java-core.version                = 0.1
+vorbis-java-core.jar                    = 
vorbis-java-core-${vorbis-java-core.version}.jar
+vorbis-java-core.loc                    = 
${maven2.repo}/org/gagravarr/vorbis-java-core/${vorbis-java-core.version}
+vorbis-java-core.md5                    = b88115be2754cb6883e652ba68ca46c8
+
+juniversalchardet.version                = 1.0.3
+juniversalchardet.jar                    = 
juniversalchardet-${juniversalchardet.version}.jar
+juniversalchardet.loc                    = 
${maven2.repo}/com/googlecode/juniversalchardet/juniversalchardet/${juniversalchardet.version}
+juniversalchardet.md5                    = d9ea0a9a275336c175b343f2e4cd8f27
+
+xz.version                = 1.1
+xz.jar                    = xz-${xz.version}.jar
+xz.loc                    = ${maven2.repo}/org/tukaani/xz/${xz.version}
+xz.md5                    = 4d0ba9643c8f3f7c6721be3a1286da1c
+
+dom4j.version                 = 1.6.1
+dom4j.jar                = dom4j-${dom4j.version}.jar
+dom4j.loc                = ${maven2.repo}/dom4j/dom4j/${dom4j.version}
+dom4j.md5                = 4d8f51d3fe3900efc6e395be48030d6d
+
+xmlbeans.version                 = 2.6.0
+xmlbeans.jar                = xmlbeans-${xmlbeans.version}.jar
+xmlbeans.loc                = 
${maven2.repo}/org/apache/xmlbeans/xmlbeans/${xmlbeans.version}
+xmlbeans.md5                = 6591c08682d613194dacb01e95c78c2c
+
+poi-ooxml.version                 = 3.8
+poi-ooxml.jar                = poi-ooxml-${poi-ooxml.version}.jar
+poi-ooxml.loc                = 
${maven2.repo}/org/apache/poi/poi-ooxml/${poi-ooxml.version}
+poi-ooxml.md5                = 8f147b248f078799c24c8714f185b1a8
+
+poi-ooxml-schemas.version                 = 3.8
+poi-ooxml-schemas.jar                = 
poi-ooxml-schemas-${poi-ooxml-schemas.version}.jar
+poi-ooxml-schemas.loc                = 
${maven2.repo}/org/apache/poi/poi-ooxml-schemas/${poi-ooxml-schemas.version}
+poi-ooxml-schemas.md5                = 7ebcffdc4d82b2b8cbc6464d4543cd07




Mime
View raw message