lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: [Performance] Streaming main memory indexing of single strings
Date Sat, 16 Apr 2005 21:32:50 GMT

On Apr 16, 2005, at 1:17 PM, Wolfgang Hoschek wrote:
>> Note that "fish*~" is not a valid query expression :)
> Perhaps the Lucene QueryParser should throw an exception then. 
> Currently 1.4.3 accepts the expression as is without grumbling...

Several minor QueryParser weirdnesses like this have turned up 
recently.  Sure enough, that is an odd one.  It parses into a 
PrefixQuery for "fish*" and the ~ is dropped.  I consider this a bug as 
this should really be a parse exception.  I've just filed this as a 

> If you're looking for an XML DB for managing and querying large 
> persistent data volumes, Nux/Saxon will disappoint you.

I want to store at least several hundred MB up to gigabytes and have 
this queryable with XQuery.  We previously used Tamino with XPath, but 
our XML is not well enough normalized to make this very feasible to 
query.  eXist, last I toyed with it, only scaled to 50MB.

Ok, so Nux/Saxon is out for our uses.  Any recommendations though?

>> Could you avoid calling match() twice here?
> That's no problem for two reasons:
> 1) The XQuery optimizer rewrites the query into an optimized 
> expression tree eliminating redundancies, etc. If for some reason this 
> isn't feasible or legal then
> 2) There's a smart cache between the XQuery engine and the lucene 
> invocation that returns results in O(1) for Lucene queries that have 
> already been seen/processed before. It caches (queryString,result), 
> plus parsed Lucene queries, plus the Lucene index data structure for 
> any given string text (which currently is a simple RAMDirectory but 
> could be whatever datastructure we come up with as part of the 
> exercise - class StringIndex or some such). This works so well that I 
> have to disable the cache to avoid getting astronomically good figures 
> on artificial benchmarks.


> BTW, I have some small performance patches for FastCharStream and in 
> various other places, but I'll hold off proposing those until our 
> exercise is done and the real merits/drawbacks of those patches can be 
> better assessed.

Excellent... we're always interested in performance improvements!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message