lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ype Kingma <>
Subject Re: too many hits - OutOfMemoryError
Date Thu, 29 May 2003 17:25:30 GMT

On Wednesday 28 May 2003 14:17, Doug Cutting wrote:
> [ Moved to lucene-dev.  -drc ]
> David_Birthwell@VWR.COM wrote:
> > But, wildcard queries that expand to many terms are allways going to be
> > memory intensive in Lucene.  We ran into this problem and decided to put
> > a check on the number of expanded terms and abort the query if the number
> > got too high.
> Perhaps we should make this a feature of Lucene.

I'm in favour of this.

> Different types of queries which expand in different ways, but they all
> expand into a BooleanQuery.  So perhaps BooleanQuery should get:
>    public static int getMaxClauseCount();
>    public static void setMaxClauseCount(int maxClauseCount);
> When more than the specified number of clauses is added an exception
> would be thrown.
> Further, I propose that the default for BooleanQuery.getMaxClauseCount()
> would be 1024.  Each TermQuery requires around 2k bytes to process it.
> This would thus limit expansions to around 2MB, however queries with
> multiple wildcard terms could use more.
> This simple fix would probably stop most OutOfMemory problems, which
> affect everyone, while only affecting a very small fraction of queries.
>   The queries that are affected are in most cases probably not useful
> queries anyway.  If someone really wishes to permit terms to expand
> further, then they can always call BooleanQuery.setMaxClauseCount().
> Comments?

I'n not familiar with the details of the query term expansion
in Lucene, so when this doesn't make sense, please correct me.

The source of the problem is with the wildcards, so wouldn't be better
to enforce a max. nr of expanded terms on these types of queries?
That would allow finer control than on 'top level'.

Eg. it would be possible to add a modifier to a wildcard term that 
means that the user wants all expanded terms for this particular
wildcard, whatever their number.
Also it would be possible to interact when the number of expanded
terms grows out of control: ie. does the user really want 
all these expanded terms, or would the user prefer to select
some of the exanded terms?

I realize such interaction features are not needed for the avarage
user, so the only thing I'd like to have is that Lucene allows for
adding such features without needing to move Lucene functionality
though it's class hierarchy.

OTOH a 'top level' control for a max. nr of clauses wouldn't hurt:
one could always set it very high, not bother about it there,
and leave the finer control to the wildcard query terms.

Kind regards,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message