lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: SOLR Boolean clause impact on memory/Performance
Date Tue, 14 Oct 2014 14:45:43 GMT
Then I predict they will continue to grow and whatever limit
you put on maxBooleanClauses will be exceeded later. And
so on, so I really think you need to re-think your model.

One approach:
1> change your model so your users are assigned to a fixed
number of groups. Then index group tokens with each document.
You can index as many tokens in the _document_ as you want.
Then your process looks like this:
1> user signs on, you go query the system-of-record for her
groups.
2> each query from that user gets a filter query with their group
tokens.

The problem with this approach is if groups change, you have
to re-index the affected documents. But it is fast. Essentially
you exchange up-front work when indexing for _much_ less
work at query time.

Second approach:
Use post-filters, see:
http://lucidworks.com/blog/advanced-filter-caching-in-solr/

These were first created for the ACL problem.

Best,
Erick

On Tue, Oct 14, 2014 at 4:31 AM, ankit gupta <ankitgupta404@gmail.com> wrote:
> Thanks Erick for responding.
>
> We have assigned 4GB memory for SOLR server and at high load where queries
> are having more than 10K boolean clauses, combination of cache and high
> boolean clauses are causing system to break. The system was working fine
> for last 8 months but ofcourse the boolean clauses has increased over time
> which I believe has caused the system to break and thats why I am looking
> for some numbers which can tell me how much memory will solr take to
> process say  1K boolean clauses in the query.
>
> The requirement at our end does required such huge number of boolean
> clauses. We need to present the search results to which user is entitled
> to.
>
> The entitlement is logic is dependent upon multiple packages. for example ,
> user has entitlement to package A and B so we need to present search
> results in case the results have tag of package A or Package B.
>
> These packages have grown over time and seems to be causing issues.
>
> Thanks,
> Ankit
>
>
>
> On Mon, Oct 13, 2014 at 5:53 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
>> Of course there will be performance and memory changes. The only
>> real question is whether your situation can tolerate them. The whole
>> point of maxBooleanClauses is exactly that going above that limit
>> should be a conscious decision because it has implications for
>> both memory and performance
>>
>> That said, that limit was put in there quite some time ago and
>> things are much faster now. I've seen installation where this limit is
>> raised over 10K.
>>
>> Are you sure this is the best approach though? Could joins
>> work here? Or reranking? (this last is doubtful, but...).
>>
>> This may well be an XY problem, you haven't explained _why_
>> you need so many conditions which might enable other
>> suggestions.
>>
>> Best,
>> Erick
>>
>> On Mon, Oct 13, 2014 at 9:10 AM, ankit gupta <ankitgupta404@gmail.com>
>> wrote:
>> > hi,
>> >
>> > Can we quantify the impact on SOLR memory usage/performance if we
>> increase
>> > the boolean clause. I am currently using lot of OR clauses in the query
>> > (close to 10K) and can see heap size growing.
>> >
>> > Thanks,
>> > Ankit
>>

Mime
View raw message