lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wolfgang Hoschek <>
Subject Re: [Performance] Streaming main memory indexing of single strings
Date Fri, 22 Apr 2005 20:53:14 GMT
I've now got the contrib code cleaned up, tested and documented into a  
decent state, ready for your review and comments.
Consider this a formal contrib (Apache license is attached).

The relevant files are attached to the following bug ID:

For a quick overview without downloading code, there's some javadoc at 

There are several small open issues listed in the javadoc and also  
inside the code. Thoughts? Comments?

I've also got small performance patches for various parts of Lucene  
core (not submitted yet). Taken together they lead to substantially  
improved performance for MemoryIndex, and most likely also for Lucene  
in general. Some of them are more involved than others. I'm now  
figuring out how much performance each of these contributes and how to  
propose potential integration - stay tuned for some follow-ups to this.

The code as submitted would certainly benefit a lot from said patches,  
but they are not required for correct operation. It should work out of  
the box (currently only on 1.4.3 or lower). Try running

	cd lucene-cvs
	java org.apache.lucene.index.memory.MemoryIndexTest

with or without custom arguments to see it in action.

Before turning to a performance patch discussion I'd a this point  
rather be most interested in folks giving it a spin, comments on the  
API, or any other issues.


On Apr 20, 2005, at 11:26 AM, Wolfgang Hoschek wrote:

> On Apr 20, 2005, at 9:22 AM, Erik Hatcher wrote:
>> On Apr 20, 2005, at 12:11 PM, Wolfgang Hoschek wrote:
>>> By the way, by now I have a version against 1.4.3 that is 10-100  
>>> times faster (i.e. 30000 - 200000 index+query steps/sec) than the  
>>> simplistic RAMDirectory approach, depending on the nature of the  
>>> input data and query. From some preliminary testing it returns  
>>> exactly what RAMDirectory returns.
>> Awesome.  Using the basic StringIndexReader I sent?
> Yep, it's loosely based on the empty skeleton you sent.
>> I've been fiddling with it a bit more to get other query types.  I'll  
>> add it to the contrib area when its a bit more robust.
> Perhaps we could merge up once I'm ready and put that into the contrib  
> area? My version now supports tokenization with any analyzer and it  
> supports any arbitrary Lucene query. I might make the API for adding  
> terms a little more general, perhaps allowing arbitrary Document  
> objects if that's what other folks really need...
>>> As an aside, is there any work going on to potentially support  
>>> prefix (and infix) wild card queries ala "*fish"?
>> WildcardQuery supports wildcard characters anywhere in the string.   
>> QueryParser itself restricts expressions that have leading wildcards  
>> from being accepted.
> Any particular reason for this restriction? Is this simply a current  
> parser limitation or something inherent?
>> QueryParser supports wildcard characters in the middle of strings no  
>> problem though.  Are you seeing otherwise?
> I ment an infix query such as "*fish*"
> Wolfgang.
> -----------------------------------------------------------------------
> Wolfgang Hoschek                  |   email:
> Distributed Systems Department    |   phone: (415)-533-7610
> Berkeley Laboratory               |
> -----------------------------------------------------------------------
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message