lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "paul.elschot (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-328) Some utilities for a compact sparse filter
Date Sun, 01 Jan 2006 20:34:00 GMT
    [ http://issues.apache.org/jira/browse/LUCENE-328?page=comments#action_12361490 ] 

paul.elschot commented on LUCENE-328:
-------------------------------------

> 1. Any particular reason for SortedVIntList not to implement DocNrSkipper interface,
the method getDocNrSkipper() is there, but declaration is missing. 

The object returned by the getDocNrSkipper() method implements the interface by adding a bit
of state
for the iteration over the document numbers. This allows more than one iterator on the (non
modifiable)
SortedVIntList.

> 2. Should getDocNrSkipper() DocNrSkipper interface throw IOException? I have tried to
add TermDocsSortedIntList to the family, but all methods in TermDocs are throwing IOException,
and it is not nice to eat silently this exception too early in DocNrSkipper. Better ideas
to deal with that? 

I have no problem with the addition of throwing an IOException to the DocNrSkipper interface.
The idea is to filter query results from RAM from which one would not normally expect
an IOException , so one could also consider rethrowing the IOException wrapped in an Error.
OTOH, the ability to obtain a DocNrSkipper directly from an index is certainly appealing,
and then IOException is unavoidable.


> 3. Paul, why SkipFilter exists (here I refer to the JIRA-330 )? Wouldn't be better to
use DocNrSkipper interface instead (SkipFilter does nothing but wrapping this interface).
Also, the same question applies to IterFilter. Did I get something wrong here? 

The presence of class BitSet in the bits() method of Filter
makes it impossible to provide another implementation of a Filter.
SkipFilter/DocNrSkipper are intended to be parallel to Filter/BitSet,
and the DocNrSkipper interface allows alternative implementations.
Both SkipFilter and Filter are interfaces for factories/caches of for filter data structures.

I'd like to somehow have these parallel paths merged, but I don't now how to
do that. Perhaps SkipFilter could allow backward compatibility by also
providing a bits() method, and use that method when it does not throw for
example an UnsupportedOperationException.
Another way would be to deprecate Filter in favour of SkipFilter, but that would
have a lot of backward compatibility issues, and perhaps also some
performance issues.
The FilteredQuery of LUCENE-330 allows for both paths to be used,
both paths are joined at line 211 in this FilteredQuery.

The IterFilter of LUCENE-330 was replaced by SkipFilter, I forgot
to indicate that when I downloaded the replacement. I have just deleted
IterFilter there.

> Must say, excelent work! 

Thanks. I should add that most of the hard work had already been done in
org.apache.lucene.store.InputStream.readVInt() and
org.apache.lucene.store.OutputStream.writeVInt().


Regards,
Paul Elschot


> Some utilities for a compact sparse filter
> ------------------------------------------
>
>          Key: LUCENE-328
>          URL: http://issues.apache.org/jira/browse/LUCENE-328
>      Project: Lucene - Java
>         Type: Improvement
>   Components: Search
>     Versions: CVS Nightly - Specify date in submission
>  Environment: Operating System: other
> Platform: Other
>     Reporter: paul.elschot
>     Assignee: Lucene Developers
>     Priority: Minor
>  Attachments: AndDocNrSkipper.java, AndDocNrSkipper.java, BitSetSortedIntList.java, DocNrSkipper.java,
DocNrSkipper.java, IntArraySortedIntList.java, IntArraySortedIntList.java, OrDocNrSkipper.java,
OrDocNrSkipper.java, SortedVIntList.java, SortedVIntList.java, SortedVIntList.java, TestDocNrSkippers.java,
TestDocNrSkippers.java, TestSortedVIntList.java, TestSortedVIntList.java, TestSortedVIntList.java,
intIterator.java
>
> Two files are attached that might form the basis for an alternative 
> filter implementation that is more memory efficient than one bit 
> per doc when less than about 1/8 of the docs pass through the filter. 
>  
> The document numbers are stored in RAM as VInt's from the Lucene index 
> format. These VInt's encode the difference between two successive 
> document numbers, much like a PositionDelta in the Positions: 
> http://jakarta.apache.org/lucene/docs/fileformats.html 
>  
> The getByteSize() method can be used to verify the compression 
> once a SortedVIntList is constructed. 
> The precise conditions under which this is more memory efficient than 
> one bit per document are not easy to specify in advance.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message