lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jichi <jichi...@gmail.com>
Subject Re: Search with very large boolean filter
Date Fri, 20 Nov 2015 23:09:34 GMT
Thanks for the quick replies, Alex and Jack!

> definitely can improve on the ORing the ids with
Going to try that! But I guess it would still hit the maxBooleanClauses=1024
threshold.

> 1. Are you trying to retrieve a large number of documents, or simply
perform queries against a subset of the index?
We would like to perform queries against a subset of the index.

> 2. How many unique queries are you expecting to perform against each
specific filter set of IDs?
There are usually only a couple (around 10) of unique queries for the same
set of IDs for a short period of time (around 1min).

> 3. How often does the set of IDs change?
The IDs are almost different for each query.
btw., the total number would be 99% be less than 1k.
But in 1% rare cases it could be more than 10k.

> 4. Is there more than one filter set of IDs in use during a particular
interval of time?
No. The ID set will be the only filter applied to "id".


Thanks!


2015-11-20 14:26 GMT-08:00 Jack Krupansky <jack.krupansky@gmail.com>:

> 1. Are you trying to retrieve a large number of documents, or simply
> perform queries against a subset of the index?
>
> 2. How many unique queries are you expecting to perform against each
> specific filter set of IDs?
>
> 3. How often does the set of IDs change?
>
> 4. Is there more than one filter set of IDs in use during a particular
> interval of time?
>
>
>
> -- Jack Krupansky
>
> On Fri, Nov 20, 2015 at 4:50 PM, jichi <jichifly@gmail.com> wrote:
>
>> Hi,
>>
>> I am using Solr 4.7.0 to search text with an id filter, like this:
>>
>>       id:(100 OR 2 OR 5 OR 81 OR 10 ...)
>>
>> The number of IDs in the boolean filter are usually less than 100, but
>> could sometimes be very large (around 30k IDs).
>>
>> We currently set maxBooleanClauses to 1024, partitioned the IDs by every
>> 1000, and batched the solr queries, which worked but became slow when the
>> total number of IDs is larger than 10k.
>>
>> I am wondering what would be the best strategy to handle this kind of
>> problem?
>> Can we increase the maxBooleanClauses to reduce the number of batches?
>> And if possible, we prefer not to create additionally large indexes.
>>
>> Thanks!
>>
>
>


-- 
jichi

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message