lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: using q= , adding fq=
Date Sun, 13 Dec 2009 11:56:07 GMT

On Dec 11, 2009, at 8:17 PM, Fer-Bj wrote:

> 
> We're running a 14M documents index. For each document we have:
>   <field name="id" 			type="sint" 	indexed="true"	stored="true"
> required="true" /> 
>   <field name="title" 			type="text_ngram" indexed="true"
> stored="true"omitNorms="true"/>
>   <field name="cat_id" 		type="sint" 	indexed="true" 	stored="true"/>
>   <field name="geo_id" 		type="sint" 	indexed="true" 	stored="true"/>
>   <field name="body" 			type="text" 	indexed="true" 	stored="false"
> omitNorms="true"/>
>   <field name="modified_datetime"  	type="date" 	indexed="true" 
> stored="true"/>
> (and a few other fields).
> 
> Our most usual query is something like this:
> q=cat_id:xxx AND geo_id:yyyy&sort=id desc   where cat_id = which "category"
> (cars,sports,toys,etc) the item belongs to, and geo_id = which city/district
> the item belongs to.
> So this query will return a list of documents posted in category xxx, region
> yyy. 
> Sorted by ID DESC, to get the newest first.
> 
> There are 2 questions I'd like to ask:
> 
> 1) adding something like:  q=cat_id:xxx&fq=geo_id=yyyy would boost
> performance?


For the n > 1 query, yes, adding filters should improve performance assuming it is selective
enough.  The tradeoff is memory.

> 
> 2) we do find problems when we ask for a page=large offset!  ie: 
> q=cat_id:xxx and geo_id:yyy&start=544545
> (note that we limit docs to 50 max per resultset).
> When start is 500 or more, Qtime is >=5 seconds.... while the avg qtime is
> <100 ms

Yes, this is likely the case.  Deep paging is not the typical use case, so what happens is
you have more and more disk accesses, plus there is a whole bunch of priority queue stuff
going on.

See http://issues.apache.org/jira/browse/LUCENE-2127


> 
> Any help or tips would be appreciated!

Do you really need "sortable ints" for all those fields?  Are you doing range queries against
them?  The name "sortable" X is a bit of a misnomer.  It doesn't mean sortable in the sense
of the &sort parameter, it means sortable in the range query sense, as in cat_id:[55 TO
1005].

-Grant

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message