lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Graham Stead" <gst...@ieee.org>
Subject RE: 'accumulate' copyField for faceting
Date Fri, 02 Mar 2007 01:38:11 GMT
Sorry for interloping, but I have been wondering the same thing as Ryan. On
my current index with ~6.1M docs, I restarted Solr and ran a query that
included faceting on 4 fields:

QTime: 5712
numFound: 25908
filterCache stats:
	lookups : 0
	hits : 0
	hitratio : 0.00
	inserts : 1
	evictions : 0
	size : 1
	cumulative_lookups : 0
	cumulative_hits : 0
	cumulative_hitratio : 0.00
	cumulative_inserts : 1
	cumulative_evictions : 0 

Then I added faceting on a 5th, multivalued field:

QTime: 65551
numFound: 25908
Filtercache stats:
	lookups : 1898314
	hits : 1
	hitratio : 0.00
	inserts : 1898314
	evictions : 1897802
	size : 512
	cumulative_lookups : 1898314
	cumulative_hits : 1
	cumulative_hitratio : 0.00
	cumulative_inserts : 1898314
	cumulative_evictions : 1897802


I realize there are a lot of different values in the 5th multivalued field.
But this is where I'm fuzzy: are we saying there would be no difference
using a tokenized, single valued field versus a multivalued field? Or are we
saying that multivalued is ok, as long as the number of values is less than
the filterCache size? [Unfortunately I don't have a single valued version of
this field to test with]

Thanks,
-Graham

> I'll be interested in seeing some numbers.  The number of 
> documents matching the base query and filters will also 
> factor in (small will be HashDocSet, large will be BitDocSet).
> 
> Just make sure to run all of your facets, then check the 
> statistics page to see how big you need to make the 
> filterCache to hold them all (and add a little extra for 
> random filters).  The access pattern for the faceting code is 
> worst case for the LRU cache, so it needs to avoid any evictions.
> 
> -Yonik



Mime
View raw message