lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Optimizing Filters
Date Thu, 17 Oct 2013 09:05:55 GMT
Yes, I think you should have a play. But on an index that is as
realistic as you can make it - there may be variations in performance
of the different queries and filters depending on term frequencies and
loads of other stuff I don't understand.  General point being simply
that YMMV.


--
Ian.


On Wed, Oct 16, 2013 at 3:07 PM, James Clarke <jclarke@basistech.com> wrote:
> Filters are created programmatically per request (and customized for the
> request) thus in order to benefit from CachingWrapperFilter we require a
> mechanism for looking up CachingWrapperFilters based on the request. But this is
> certainly an area worth trying (we could probably reuse each filter 10 times,
> because of the variation in requests and NRT search).
>
> I was hoping to improve query latency by reformulating the filters and
> queries. However my intuition of the best practice for filter and query
> construction is lacking i.e., is it better to use a TermsFilter and
> MatchAllDocsQuery or a BooleanQuery of TermQuerys, or a BooleanQuery of
> ConstantScoreQuerys of TermQuery etc.
>
> Maybe I should just hunker down and create a synthetic index and try many
> different combinations of filter/query construction.
>
> On Oct 11, 2013, at 7:33 AM, Ian Lea <ian.lea@gmail.com> wrote:
>
>> Are you going to be caching and reusing the filters e.g. by
>> CachingWrapperFilter?  The main benefit of filters is in reuse.  It
>> takes time to build them in the first place, likely roughly equivalent
>> to running the underlying query although with variations as you
>> describe.  Or are you saying that querying with filters is slow?
>>
>>
>> --
>> Ian.
>>
>>
>> On Thu, Oct 10, 2013 at 7:01 PM, James Clarke <jclarke@basistech.com> wrote:
>>> Are there any best practices for constructing Filters to search efficiently?
>>> From my non-exhaustive experiments I cannot intuit how to construct my filters
>>> to achieve best performance.
>>>
>>> I have an index (Lucene 4.3) of about 1.8M documents which contain a field
>>> acting as a flag (evidence:true). Initially all the documents I am interested
in
>>> searching have this field. Later as the index grows some documents will not have
>>> this field.
>>>
>>> In the simplest case I want to filter on documents with evidence:true. Running
a
>>> couple of hundred queries sequentially and recording how long it takes to
>>> complete.
>>>
>>> * No filter: ~40s
>>> * QueryWrapperFilter(TermQuery(evidence:true)): ~80s
>>> * FieldValueFilter(evidence): ~43s
>>> * TermsFilter(evidence:true): ~50s
>>>
>>> This suggests QWF is a bad idea.
>>>
>>> A more complex filter is:
>>>
>>>  (evidence:true AND (cid:x OR cid:y ...) AND language:eng)
>>>
>>> Where 1.8M documents evidence:true, 2-4 documents per cid clause, 1-60 cid
>>> clauses, and 1.4M documents lang:eng.
>>>
>>> Our initial implementation uses QWF of a BooleanQuery(TQ AND BQ(OR) AND TQ)
>>> which takes ~210s.
>>>
>>> Adjusting this to be a BooleanFilter(TermsFilter AND TermsFilter AND
>>> TermsFilter) sees things slow down to ~239s!
>>>
>>> Any advice on optimizing these filters would be appreciated!
>>>
>>> James
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message