lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From d33mb33 <>
Subject Post processing to get around TooManyClauses?
Date Fri, 07 Dec 2007 11:33:22 GMT

I have developed a fuzzy search application over a database of books (titles,
authors etc) and it works really well.  (I use Lucene.Net but read the
JavaDocs and forums for java Lucene)

However I've got an interesting use case with "TooManyClauses" and need some
help in solving it.

My users accept that silly title queries like "m*" are going to return too
many results to be useful but they want to combine the wildcard searches
with other search terms.

For example:

Use Case 1
A user wants to search for books by "Charles Dickens"
This works fine using a term query and about 500 results are returned

Use Case 2
A user wants to search for books by "Charles Dickens" where the title starts
with M
This throws a TooManyClauses exceptions (or eats a huge amount of RAM)
because, I guess, Lucene treats the two as independent queries and M* is
expanded across the whole index and not just the books by Charles Dickens. 
User's don't understand why the Use Case 1 works but Use Case 2 doesn't. 
Use Case 2 as actually being a more restrictive query and will return better
results than Use Case 1.

I've thought a bit about how to solve this but none of them seem very
elegant or efficient.
One solution could be too eliminate one or two character wildcards from the
inital search and then loop through the results doing a String.contains or
something horrible.
Another solution could be through clever use of the QueryFilter classes but
I don't quite understand how they work yet.

Any suggestions would be welcome

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message