lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
Date Sat, 13 Jan 2007 04:15:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12464410
] 

Grant Ingersoll commented on LUCENE-675:
----------------------------------------

Doron, 

I have committed your additions.  This truly is great stuff.  Thank you so much for contributing.
 The documentation (code and package level) is well done, the output is very readable.  The
alg language is a bit cryptic and takes a little deciphering, but you do document it quite
nicely.   I like the extendability factor and I think it will make it easier for people to
contribute benchmarking capabilities.

I would love to see someone mod the reporting mechanism in the future to allow for printing
info to something other than System.out, as I know people have expressed interest in being
able to slurp the output into Excel or similar number crunching tools.   This could also lead
to the possibility of running some of the algorithms nightly and then integrating with JUnitPerf
or some other performance unit testing approach.

We may want to consider deprecating the other benchmarking stuff, although, I suppose it can't
hurt to have multiple opinions in this area.

At any rate, this is very much appreciated.  I would encourage everyone who is interested
in benchmarking to take a look and provide feedback.  I'm going to mark this bug as finished
for now as I think we have a good baseline for benchmarking at this point.

Thanks again,
Grant




> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-675
>                 URL: https://issues.apache.org/jira/browse/LUCENE-675
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Andrzej Bialecki 
>         Assigned To: Grant Ingersoll
>            Priority: Minor
>         Attachments: benchmark.byTask.patch, benchmark.patch, BenchmarkingIndexer.pm,
byTask.2.patch.txt, byTask.jre1.4.patch.txt, extract_reuters.plx, LuceneBenchmark.java, LuceneIndexer.java,
taskBenchmark.zip, timedata.zip, tiny.alg, tiny.properties
>
>
> We need an objective way to measure the performance of Lucene, both indexing and querying,
on a known corpus. This issue is intended to collect comments and patches implementing a suite
of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is the original
Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz
or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I
propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically
retrieve it from known locations, and cache it locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message