lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Elworthy" <>
Subject RE: storing index in third party database.
Date Wed, 03 Apr 2002 14:55:41 GMT
> -----Original Message-----
> From: Karl Øie []
> Sent: Wednesday, April 03, 2002 10:00 AM
> To: Lucene Users List
> Subject: Re: storing index in third party database.
> without having investigated the problem much i would think that a SQL 
> database would be a very bad match for lucene as most of 
> lucene's working is 
> creating key's for words and documents and then creating 
> indexes of these 
> keys. for these purposes a SQL database is an unecessary 
> overhead, not even 
> talking about the overhead represented by the SQL language parser.
> for these kind of indexes a lower-level database would be 
> better suited. I 
> have good experiences with BerkeleyDB 
> ( and a friend 
> of me uses gdbm successfully for such key-pair indexing 
> tasks. the advantage 
> of these low-level databasesystems is that they are really 
> much or less 
> persistent b-tree/hashtable implementations, and thus created 
> for key-pairing.
> they have no SQL layer as you will have to program against 
> them as they are 
> more subroutines that applications. but for key-pair indexes i have 
> experienced that BerkeleyDB runs circles around any SQL 
> database (including 
> db2 and oracle!!!).

I would agree with this based on my experiences in implementing the
ANVIL system at Canon. SQL server was far too slow for simple term
lookup. We started with gdbm and subsequently moved to Berkeley DB. BDB
was faster in general, and more importantly, has support for
multi-threading. Analysis with Purify suggested that gdbm has some
"uninitialized memory read" problems. The folks at Sleepycat were also
very helpful in getting us going.

-- David

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message