lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: filter query on timestamp slowing query???
Date Fri, 23 Jul 2010 02:01:38 GMT

: You are correct, first of all i haven't move yet to the TrieDateField, but i
: am still waiting to find out a bit more information about it, and there's
: not a lot of info, other then in the xml file.

In general TrieFields are a way of trading disk space for range query 
speed.  they are explained fairly well if you look at the docs...

http://lucene.apache.org/solr/api/org/apache/solr/schema/TrieField.html
http://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/search/NumericRangeQuery.html

...allthough i realize now that "TrieDateField's docs don't actually 
link to "TrieField" where the explanation is provided.

AS for your usecase...

: I'll explain my use case, so you'll know a bit more. I have an  index that's
: being updated regularly, (every second i have 10 to 50 new documents, most
: of them are small)
: 
: Every 30 minutes, i ask the index what are the documents that were added to
: it, since the last time i queried it, that match a certain criteria.
: >From time to time, once a week or so, i ask the index for ALL the documents
: that match that criteria. (i also do this for not only one query, but
: several)
: This is why i need the timestamp filter.
: 
: The queries that don't have any time range, take a few seconds to finish,
: while the ones with time range, take a few minutes.
: Hope that helps understanding my situation, and i am open to any suggestion
: how to change the way things work, if it will improve performance.

you keep saying you run "simple queries" and gave an example of 
"myStrField:foo" and you say you "ask the index what are the documents 
that were added to it, since the last time i queried it" ... but you've 
never given any concrete example of a full Solr request that incorporates 
these timestamp filtering so we can see *exactly* what your requests look 
like.  Even with an index the size you are describing, and even with the 
slower performance of "DateField" compared to TreiDateField i find it hard 
to believe that a query for "myStrField:foo" would go fro ma few seconds 
to several minutes by adding an fq range query for a span of ~30 minutes.  
are you by any chance also *sorting* the documents by that timestamp field 
when you do this?

My best guess is that either:

  a) your "raw query performance" is generally really bad, but you don't 
notice when you do your "simple queries" because of solr's 
queryResultCache -- but this can't be used when you add the fq so you see 
the bad performance then.  If this is the situation I have no real 
suggestions

  b) when you do your individual requests that filter by your timestamp 
field you are also sorting by your timestamp field -- a field you don't 
ever sort on in any other queries so the filterCache needed for sorting 
needs to be built before those queries can be returned.  if you stop 
sorting onthis timestamp field (or add a newSearcher warming query that 
does the same sort) then the problem should go away.



-Hoss


Mime
View raw message