lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dror Matalon <>
Subject Re: SQLDirectory
Date Fri, 06 Feb 2004 22:00:09 GMT
On Fri, Feb 06, 2004 at 04:25:53PM -0500, Philippe Laflamme wrote:
> > > > A connection per file sounds very heavyweight.
> > >
> > > Indeed it is. Using Postgres' LargeObjects to represent a file has its
> > > limitations: every time Lucene requires a stream on a file, a
> > connection is
> > > required (and cannot be shared). Implementing it this way was
> > quick, but not
> > > at all optimal.
> >
> > But large objects have much better read/write performence than using
> > regular text fields.
> Maybe, but they require opening a lot of connections on the database (at
> least, for Postgres' implementation). I don't know much about Lucene's
> requirements regarding the number of files generated during indexing but
> they seem large. So if the number of concurrently open files grows as the
> index grows, opening a connection per file is not an option.

I've seen Lucene opene hundreds of files. It'd be quite an overhead to
open hundreds of connections to the database.

> > >
> > > Your suggestion is quite interesting. It would not require the usage of
> > > Blobs which are not very portable. It could be implement using
> > standard SQL
> > > types and would make an elegant SQLDirectory (and not an RDBMS specific
> > > Directory).
> >
> > I suspect you're going to get lousy performence compared to using
> > regular files.
> Yes and it was to be expected: it's doubtful that a large object be faster
> than any regular file.
> Postgres uses a regular file per large object, so with the additional
> overhead of the JDBC driver, I was expecting slower performances. I did not
> expect the number of connections to become so high though.

I didn't realize that. I guess it make sense since that way you'd have
the best performence when you're reading/writing/seeking.

> > Why is it that you want to save the index files in a db?
> > It's not like you'll have any additional meta data or functionality. The
> > only advantage that I can think of is that you can have control of
> > read/write locking across machines. In other words, you can have one
> > machine doing the writing and one or more machines doing the
> > reading/searching.
> I'm not looking for blazing indexing performance, I'm more interested in the
> searching side of things. Making an index available on several different
> hosts is trivial using a database, but not so easy using a file system.
> Also, using database replication makes distributing an index a breeze... To
> me, it's more a matter of creating a scalable design.

Fair enough.  We use postgres a lot, but as I mentioned earlier, we've
handled scaling issue using NFS and snapshots. I'm curious to see what
happens with your database approach.

> Any thoughts on that would be appreciated...

I'd give Doug's idea a shot. Use a lot of small text fields to store the



> Regards,
> Phil
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message