lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <>
Subject [jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
Date Sun, 12 Nov 2006 10:02:39 GMT
    [ ] 
Doron Cohen commented on LUCENE-675:

I looked at extending the benchmark with:
- different test "scenarios", i.e. other sequences of operations.
- multithreaded tests, e.g. several queries in parallel.
- rate of events, e.g. "2 queries arriving per second", or "one query per second in parallel
with 20 new documents in a minute".
- different data sources (input documents, queries).

For this I made lots of changes to the benchmark code, using parts of it and rewriting other
I would like to submit this code in a few days - it is running already but some functionality
is missing.

I would like to describe how it works to hopefully get early feedback. 

There are several "basic tasks" defined - all extending an (abstract) class PerfTask:
- AddDocTask
- OptimizeTask
- CreateIndexTask

To further extend the benchmark 'framework', new tasks can be added. Each task must implement
the abstract method: doLogic(). For instance, in AddDocTask this method (doLogic) would call
There are also setup() and tearDown() methods for performing work that should not be timed
for that task. 

A special TaskSequence task contains other tasks. It is either parallel or sequential, which
tells if it executes its child tasks serially or in parallel. 
TaskSequence also supports "rate": the pace in which its child tasks are "fired" can be controlled.

With these tasks, it is possible to describe a performance test 'algorithm' in a simple syntax.
('algorithm' may be too big a word for this...?)

A test invocation takes two parameters: 
- - file with various config properties.
- test.alg               - file with the algorithm.

By convention, for each task class  "OpNameTask",  the command  "OpName"  is valid in test.alg.

Adding a single document is done by:

Adding 3 documents:

Or, alternatively:
   { AddDoc } : 3

So, '{' and '}' indicate a serial sequence of (child) tasks. 

To fire 100 queries in a row:
  { Search } : 100

To fire 100 queries in parallel:
  [ Search ] : 100

So, '[' and ']' indicate a parallel group of tasks. 

To fire 100 queries in a row, 2 queries per second (120 per minute):
  { Search } : 100 : 120

Similar, but in parallel:
  [ Search ] : 100 : 120

A sequence task can be named for identifying it in reports:
  { "QueriesA" Search } : 100 : 120

And there are tasks that create reports. 

There are more tasks, and more to tell on the alg syntax, but this post is already long..

I find this quite powerful for perf testing.
What do you (and you) think?

- Doron

> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>                 Key: LUCENE-675
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Andrzej Bialecki 
>         Assigned To: Grant Ingersoll
>         Attachments: benchmark.patch,, extract_reuters.plx,,,
> We need an objective way to measure the performance of Lucene, both indexing and querying,
on a known corpus. This issue is intended to collect comments and patches implementing a suite
of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is the original
Reuters collection, available from
or I
propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically
retrieve it from known locations, and cache it locally.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message