lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <>
Subject RE: mg4j - Managing Gigabyte for Java
Date Thu, 16 Sep 2004 19:04:51 GMT
I think the best way to move in this direction is to make IndexReader and
IndexWriter pure interfaces.

It will go along way towards these sort of changes, since the api at the
interface level will need configuration (capability queries) methods in
order to support using any 'lucene tools' with any 'lucene index'.

I know it has been discussed before, but is this (interfaces for
IndexReaderWriter) going to make it on the list for 1.9/2.0 ?

-----Original Message-----
From: Doug Cutting []
Sent: Thursday, September 16, 2004 1:56 PM
To: Lucene Developers List
Subject: Re: mg4j - Managing Gigabyte for Java

Antonio Gulli wrote:
> Just a question: my personal experience with a commercial engine i
> partly developed is the the "continuation bit" (aka altavista solution)
> is a good and efficient solution w.r.t gamma code, delta code and other
> codes used for variable lenght int rappresentation (see MG).
> Given an int say n, continuation bit is just to consider a byte as 7 bit
> + 1 bit used to say if the next byte is also used to rappresent n.

This is what Lucene uses for the reasons you mention: it is a good
compromise between compression and performance.

Long-term I'd like to make Lucene's posting format extensible.  In
addition to altering the compression method, the granularity of the
index should be flexible.  Currently postings for all indexed fields
consist of  <document, frequency, <position*> > tuples.  Instead, folks
should be able to have postings like:
   . <document> for pure boolean matching only
   . <document, weight> for vector matching, no phrases
   . <document, frequency, <position, weight>* > for boosting term
occurrences by, e.g., position in document, bolding, headings, etc.

Extending Lucene to efficiently and flexibly support this will be a
design challenge, but I think it will benefit lots of applications.


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message