jmeter-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philippe Mouawad <philippe.moua...@gmail.com>
Subject Re: Add Apache Tika in JMeter to extract text from various file type
Date Mon, 05 Nov 2012 20:05:53 GMT
But wouln't this make setup more complex and error prone ?
See nightly build experience, lot of people miss the fact they must copy
lib folder in first zip.

It would not work out of the box anymore as it does for now.
Isn't too much work for just size concern ?

Sebb what do you mean by catching exception ?
Is it at first time or every call , if so wouln't impact negatively
performances ?
Regards
Philippe


On Monday, November 5, 2012, sebb wrote:

> On 5 November 2012 14:00, Milamber <milamber@apache.org <javascript:;>>
> wrote:
> >
> >
> > Le 05/11/2012 11:26, sebb a ecrit :
> >
> >> On 3 November 2012 19:23, Milamber<milamber@apache.org <javascript:;>>
>  wrote:
> >>>
> >>> Hello,
> >>>
> >>> Currently, I work to add Apache Tika 1.2 [1] in JMeter to improve
> >>> functional
> >>> tests.
> >>>
> >>> With Tika, you can extract the text form various documents, like MS
> >>> Office
> >>> (Word, Excel, PowerPoint 97-2003, 2007-2010 (openxml), OpenOffice
> >>> (writer,
> >>> calc, impress), HTML, Gz, jar/zip files (list of content), and some
> >>> "multimedia" files like mp3, mp4, flv, etc.
> >>>
> >>> In JMeter, Tika can be used by the View Results Tree to view the text
> >>> data
> >>> of this files, Regular extractor to catch some text from this files and
> >>> Response assertion to assert on the data.
> >>>
> >>> The inconvenient is: Apache Tika requires a big jar (25Mb) or a lot of
> >>> jar
> >>> files (see below). With all jars in the binary package, the new size
> (for
> >>> tgz) is 45 Mb (JMeter 2.8 tgz : 23Mb)
> >>>
> >>> The question: are you agree to add Tika (and new capability to "extract
> >>> text
> >>> from Document") in JMeter with the new binary size?
> >>>
> >>> Secondary question: what the good way? : 1/ Add only tika-app.jar
> (which
> >>> include all dependencies) [2], or 2/ Add several jar files (tika-core,
> >>> tika-parser, etc + dependencies) [3]
> >>
> >> I'm concerned that using Tika would double the size of JMeter.
> >> Although the extra features would be useful, I suspect that most test
> >> cases won't need the extra functionality.
> >>
> >> Would it be possible to make the Tika jars optional?
> >> i.e. add the functionality, but if the jars are not present it is
> >> disabled.
> >
> >
> > Yes seems possible via a dynamic class control / loading
> >
> >
> >
> >>
> >> If we accept that developers must download Tika, then it should be
> >> easy enough to structure the add-on so that JMeter can fail gracefully
> >> if the jars are missing.
> >> But ideally developers would not need to download all the jars either.
> >
> >
> > Currently, to compile the "tika" elements, we must have only these jars :
> > tika-core.jar
> > tika-parsers.jar
>
> That would be fine.
>
> > To the binary release, we needs had these jars (full list):
> > apache-mime4j-core.jar
> > apache-mime4j-dom.jar
> > asm.jar
> > aspectjrt.jar
> > boilerpipe.jar
> > commons-compress.jar
> > dom4j.jar
> > fontbox.jar
> > geronimo-stax-api_1.0_spec.jar
> > gson.jar
> > isoparser.jar
> > jempbox.jar
> > juniversalchardet.jar
> > log4j.jar
> > metadata-extractor.jar
> > netcdf.jar
> > pdfbox.jar
> > poi-ooxml-schemas.jar
> > poi-ooxml.jar
> > poi-scratchpad.jar
> > poi.jar
> > rome.jar
> > slf4j-api.jar
> > slf4j-log4j12.jar
> > tagsoup.jar
> > tika-core.jar
> > tika-parsers.jar
> > tika-xmp.jar
> > vorbis-java-core.jar
> > vorbis-java-tika.jar
> > xmlbeans.jar
> > xmpcore.jar
> > xz.jar
> >
> > Or only the tika-app.jar (25Mb)
> >
> >
> > So, we can add the "tika" functionalities with dynamic class loading, add
> > some warning messages to indicate the download of tika-app.jar if you
> want
> > have the tika behavior
> >
> > For View Results Tree, when the "Document" combo list is choosed: a
> message
> > in Response data to indicate the missing tika-app.jar (with some
> indication
> > where download it)
> >
> > For RegExp and Response Assertion, if missing tika-app.jar, a warning
> dialog
> > to show the message when the radio button "Response as a Document" is
> > selected
> >
> > And in all cases, a warning message in jmeter.log.
>
> Rather than use dynamic class loading, would it not be possible to
> just catch the Exceptions that are thrown when the jars are missing?
>
> If the code builds OK with just tika-core.jar and tika-parsers.jar
> this should be sufficient.
>
> >
> >
> >
> >>
> >
>


-- 
Cordialement.
Philippe Mouawad.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message