lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Engels" <reng...@ix.netcom.com>
Subject RE: Save to database...
Date Thu, 05 Jan 2006 10:26:17 GMT
Yes, that is what I did in the "custom" persistence.

There are some not so trivial problems to solve though. Normally you cannot
seek with BLOBs, (a lot of JDBC/db impl will read the entire BLOB in all
cases) so efficiently reading the postings can be difficult, although you
can store the postings using a startdoc, enddoc, postings schema, which will
allow skipTo() to function.

The biggest problem is updating the postings efficiently - if you allow the
reuse of internal document numbers. If you don't allow a reuse, then you
need to periodically (RARELY!) compact the database while it is offline.
This will be a TIME CONSUMING process.

The Lucene standard index format is very efficient because it do not update,
but rather builds new indexes, and searches the indexes together.

-----Original Message-----
From: Aditya Liviandi [mailto:adityal@i2r.a-star.edu.sg]
Sent: Thursday, January 05, 2006 2:43 AM
To: java-dev@lucene.apache.org; rengels@ix.netcom.com
Subject: RE: Save to database...



What I meant is instead of saving the indexes into files, could I save
them as tables in a database?

I would think there would be a FieldNames table, a TermDictionary table,
a TermFrequency table and a TermPosition table (at the least)...

Has this been done before?


-----Original Message-----
From: Robert Engels [mailto:rengels@ix.netcom.com]
Sent: Thursday, January 05, 2006 4:36 PM
To: java-dev@lucene.apache.org
Subject: RE: Save to database...

There are impl in the contrib that do not need to retrieve the entire
index
from the db in order to query (there store blocks of files in the db,
instead of blocks on disk).

I also developed an implementation that did not use blocks but rather a
custom index persistence mechanism.

There can be several advantages with this:

1. centralized/automated backup
2. db usually can perform more optimized caching of blocks
3. use multiple "nearly diskless" query servers with a single database
4. transactional updates to the index are possible (although index
writes
are supposedly transactional in std lucene, I have encountered some
index
corruption with hard failures - I think it is because the files are not
"synced" when flushed/closed).


-----Original Message-----
From: Steven Pannell [mailto:steven.pannell@zooplus.com]
Sent: Thursday, January 05, 2006 2:22 AM
To: 'java-dev@lucene.apache.org'
Subject: RE: Save to database...


Look in the old archive mails and you will find a few people have tried
this
out.  There is even some code around.

I have tried this, and to be honest it does not make much sense. The
real
problem is performance it just takes too long to keep getting the index
from
the database for performing the query.  In the end I realised this was
not
the way to go and just use the standard filesystem and memory cache for
this. Much faster.

The only reason for doing this would be if you could not easily
reproduce
the index and thus wanted to make sure you had some kind of permananet
copy.


Out of interest why do you want to store in the DB?

Steve.



-----Original Message-----
From: Aditya Liviandi [mailto:adityal@i2r.a-star.edu.sg]
Sent: 05 January 2006 07:15
To: java-dev@lucene.apache.org
Subject: Save to database...



How would I go about altering lucene so that the index is saved to a
database instead?

(or has it been done? Wouldn't want to reinvent the wheel there.)



--------------------------------------------------
This email is confidential and may be privileged.  If you are not the
intended recipient, please delete it and notify us immediately. Please
do
not copy or use it for any purpose, or disclose its contents to any
other
person. Thank you.
--------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


--------------------------------------------------
This email is confidential and may be privileged.  If you are not the
intended recipient, please delete it and notify us immediately. Please do
not copy or use it for any purpose, or disclose its contents to any other
person. Thank you.
--------------------------------------------------


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message