lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Finding documents with undefined field
Date Wed, 07 Jun 2006 19:58:48 GMT

On Jun 7, 2006, at 3:43 PM, Chris Hostetter wrote:
> Getting the inverse of a DocSet is currently not a built in  
> operation, you
> have to use the getBits() method and operate on it, something like  
> this
> should work...
>
>   DocSet definedSet = search.getDocSet(parseQuery("field:[* TO *]"));
>   DocSet unDefinedSet = new BitDocSet(fieldDefinedSet.getBits().flip 
> (0,search.maxDoc()))
>   int count = unDefinedSet.intersectionCount(results.docSet)
>
> ...at least, i think it should work .. i've never really had to worry
> about inverted sets.

Here's how I build "inverse" BitSets that represent documents that do  
not have a value in a facet field:

       BitSet catchall = new BitSet(reader.numDocs());

       TermEnum termEnum = reader.terms(new Term(field, ""));
       while (true) {
         Term term = termEnum.term();
         if (term == null || !term.field().equals(field)) break;

         termDocs.seek(term);
         BitSet bitSet = new BitSet(reader.numDocs());
         while (termDocs.next()) {
           bitSet.set(termDocs.doc());
         }

         catchall.or(bitSet);

         // ... cache bitSet ...

         if (! termEnum.next()) break;
       }

       // ... cache catchall ...

Solr's DocSets are a better way to go in the long run, I'm convinced  
- I'm just now starting to leverage them in other ways.  I do still  
need to do these kinds of inverted sets somehow.

	Erik


Mime
View raw message