lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Imbeault <michael.imbea...@sympatico.ca>
Subject Re: Facet performance with heterogeneous 'facets'?
Date Thu, 21 Sep 2006 21:00:30 GMT
Thanks for all the great answers.

>> Quick Question: did you say you are faceting on the first name field
>> seperately from the last name field? ... why?
You misunderstood. I'm doing faceting on first author, and last author 
of the list. Life science papers have authors list, and the first one is 
usually the guy who did most of the work, and the last one is usually 
the boss of the lab. I already have untokenized author fields for that 
using copyField.
>> Second: you mentioned increasing hte size of your filterCache
>> significantly, but we don't really know how heterogenous your index 
>> is ...
>> once you made that cahnge did your filterCache hitrate increase? .. 
>> do you
>> have any evictions (you can check on the "Statistics" page)
It was at the default (16000) and it hit the ceiling so to speak. I did 
maxSize=16000000 (for testing purpose) and now size : 17038 and 0 
evictions. For a single facet field (journal name) with a limit of 5 and 
12 faceted query fields (range on publication date), I now have 0.5 
seconds search, which is not too bad. The filtercache size is pretty 
much constant no matter how many queries I do.

However, if I try to add another facet field (such as first_author), 
something strange happens. 99% CPU, the filter cache is filling up 
really fast, hitratio goes to hell, no disk activity, and it can stay 
that way for at least 30 minutes (didn't test longer, no point really). 
It turns out that journal_name has 17038 different tokens, which is 
manageable, but first_author has > 400 000. I don't think this will ever 
yield good performance, so i might only do journal_name facets.

Any reasons why facets tries to preload every term in the field?

I have noticed that facets are not cached. Facets off, cached query take 
0.01 seconds. Facet on, uncached and cached queries take 0.7 seconds. 
Any plans for a facets cache? I know that facets is still a very early 
feature, but its already awesome; my application is maybe irrealistic.

Thanks,
Michael


Mime
View raw message