lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Post processing to get around TooManyClauses?
Date Fri, 07 Dec 2007 14:43:06 GMT
Have you looked at Filters? Essentially, you construct a bitmap where each
bit corresponds to a document and pass that along into your search.
a filter is surprisingly speedy.

What you lose is relevancy for the filtered part of the query. See

Also, search the mail archive for wildcard and you'll find a wealth of
See the thread "I just don't get wildcards at all" for some very worthwhile


On Dec 7, 2007 6:33 AM, d33mb33 <> wrote:

> I have developed a fuzzy search application over a database of books
> (titles,
> authors etc) and it works really well.  (I use Lucene.Net but read the
> JavaDocs and forums for java Lucene)
> However I've got an interesting use case with "TooManyClauses" and need
> some
> help in solving it.
> My users accept that silly title queries like "m*" are going to return too
> many results to be useful but they want to combine the wildcard searches
> with other search terms.
> For example:
> Use Case 1
> A user wants to search for books by "Charles Dickens"
> This works fine using a term query and about 500 results are returned
> Use Case 2
> A user wants to search for books by "Charles Dickens" where the title
> starts
> with M
> This throws a TooManyClauses exceptions (or eats a huge amount of RAM)
> because, I guess, Lucene treats the two as independent queries and M* is
> expanded across the whole index and not just the books by Charles Dickens.
> User's don't understand why the Use Case 1 works but Use Case 2 doesn't.
> Use Case 2 as actually being a more restrictive query and will return
> better
> results than Use Case 1.
> I've thought a bit about how to solve this but none of them seem very
> elegant or efficient.
> One solution could be too eliminate one or two character wildcards from
> the
> inital search and then loop through the results doing a String.contains or
> something horrible.
> Another solution could be through clever use of the QueryFilter classes
> but
> I don't quite understand how they work yet.
> Any suggestions would be welcome
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message