lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Faceted Browsing questions
Date Mon, 26 Jun 2006 23:28:56 GMT

On Jun 24, 2006, at 4:29 PM, Yonik Seeley wrote:
> On 6/24/06, Erik Hatcher <erik@ehatchersolutions.com> wrote:
>> This weekend :)   I have imported more data than my hacked
>> implementation can handle without bumping up Jetty's JVM heap size,
>> so I'm now at the point where it is necessary for me to start using
>> the LRUCache.  Though I have already refactored to use OpenBitSet
>> instead of BitSet.
>
> You can also fit more in mem if you can use DocSet (HashDocSet) for
> smaller sets.  This will also speed up intersection counts.  This is
> done automatically when you get the DocSet from Solr, or if numDocs()
> is used.

Thanks for this advice, Yonik.   I've refactored (but not committed  
yet, for those that may be looking to see what I've done) the  
caching.  The cache (currently a single HashMap) is built keyed by  
field name, with nested HashMap's keyed by field value.  The inner  
map used to contain BitSets, then OpenBitSets, but now it contains  
only TermQuery's.  Now I simply use SolrIndexSearcher.getDocSet 
(query) and rely on the existing query caching.  The only thing my  
custom cache puts into RAM now is this HashMap of all faceted fields,  
values, and associated TermQuery's.  At some point that might even  
become an issue, but maybe not.

It may not even be necessary to cache this type of lookup since it is  
simply a TermEnum through specific fields in the index.  Maybe simply  
doing the TermEnum in the request handler instead of iterating  
through a cache would be just as fast or faster.  Any thoughts on that?

Either way, at the moment things are screaming fast and memory is  
pleasantly under control.

My next challenge is to re-implement the catch-all facets that I used  
to do by unioning all documents in an (Open)BitSet and inverting it.   
How can I invert a DocSet (I realize I gat get the bits and do it  
that way, but is there a better way)?

	Erik


Mime
View raw message