lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nader S. Henein" <>
Subject RE: Crash / Recovery Scenario
Date Tue, 09 Jul 2002 10:45:10 GMT
I'm not worried about my hardware I've been blessed with an 8 CPU Sum
machine and 2 2 CPU sun Machines with gegs of memory, and I do run Lucene
with 15 threads, I've set my merge factor at 1000 so a lot of work is done
in memory (speed), my current concerns are Recovery related as I'm a few
days from deployment, on a windows based machines I'm not too falmiliar with
the threading setup, the beauty of unix
is you can do anything, I'm worried about Lucene hanging mid indexing, how
do I monitor that?

-----Original Message-----
From: none none []
Sent: Monday, July 08, 2002 11:05 PM
Subject: RE: Crash / Recovery Scenario

 If you tell me the computer doesn't crash , the only thing is you want to
stop safely the process, well , in this case the Manager will not stop until
the task is not complete, because i am running the Manager as an NT service
,i have a little problem here, you cannot stop a thread while it is doing
I/O operation like recursively scan of a directory, you have to wait a
little bit.
 I see that you are looking for software stability, but software is strictly
related with hardware, you need a good hardware too, think about a RAID
structure, 0 or 5 depends, think about a clustered system.
This depends what you want from your search engine.
Also i think is good focus on have a good cache status, e.g.: if i have a
bad error and i can't recover the index i rebuild it by calling a method
that scan all my cache, it is no great but better than nothing . Also i
never had that kind of problem.
Also adopt a multi threaded will improve by 40% the actual speed, you need
to merge all the segments at the end. (i tested just with 2 thread on
If you are looking for a search engine like google, there is a lot of work
to do, A LOT!!!!
My opinion is to split index and cache on 'n' machine , but the only thing i
don't know how to do it's run a search on multiple index on multiple
machine, with sockets will not work, sockets become really slow with heavy
traffic, i was thinking on a Java compatible DLL able to merge multiple
machine as a logical unit.



On Mon, 8 Jul 2002 21:07:32
 Nader S. Henein wrote:
>brilliant .. I was thinking along the same lines, a new issue that I'm
>facing is just lucene dying on me, in the middle of indexing .. no server
>crash .. nothing .. what do you do if it just stops mid-indexing ?
>-----Original Message-----
>From: none none []
>Sent: Monday, July 08, 2002 8:42 PM
>Subject: Re: Crash / Recovery Scenario
> hi, i do perform the same things as you do, but i do that everytime i got
>NullPointerException when i try to run a search . If this happen i try to
>reopen the index searcher, if i got an exception here i sleep for 500 ms
>then i try again, after 5 times i generate a servlet exception. Concerning
>the delete of write.lock and commit.lock, i use a manager,what it does is
>execute different kind of operation in blocks, like 100 or 1000.
>Each operation can be:
>1.Delete documents
>2.Add documents
>3.Search document/s
>A combination of this 3 operation allow me to "update" the index with
>searches still running. But there is a problem "versioning", between
>cache of documents and current version of "INDEXED" documents, during
>you can search for something that is found in the index but that has been
>updated in the cache, so i have a bounch of documents duplicate during
>and at the end i notify using a RMI callback all the clients connected to
>that Manager to re open the index, then i clean up all this duplicate. At
>this stage i have still an error in case the Manager die because i have all
>in memory, but i did a little work around to handle that. My next step is
>make this "transaction" persistent, so i can recovery the previous
>Every time i run an operation as listed above i do a check if "write.lock"
>or "commit.lock" exists, in that case i call the unlock() method, i delete
>them (if the method unlock doesn't), then i optimize the index.
>Until now everything seems to work fine.
>On Mon, 8 Jul 2002 09:40:10
> Nader S. Henein wrote:
>>I'm currently using Lucene to sift through about a million documents, I've
>>written a servlet to do the indexing and the searching, the servlets are
>>through resin, The Crash scenario I'm thinking of is a web server crash (
>>for a million possible reasons ) while the index is being updated or
>>optimized, what I've noticed is the creation of write.lock and commit.lock
>>files witch stop further indexing because the application thinks that the
>>previously scheduled indexer is still running (witch could very well be
>>depending on the size of the update). This is the recovery I have in mind
>>but I think it might be somewhat of a hack, On restart of the web server
>>I've written an Init function that checks for write.lock or commit.lock ,
>>and if either exist it deletes both of them and optimizes the index. Am I
>>forgetting anything ? is this wrong ? is there a Lucene specific way of
>>doing this like running the optimizer with a specific setup.
>>Nader S. Henein
>> , Dubai Internet City
>>Tel. +9714 3911900
>>Fax. +9714 3911915
>>GSM. +9715 05659557
>>To unsubscribe, e-mail:
>>For additional commands, e-mail:
>Supercharge your e-mail with a 25MB Inbox, POP3 Access, No Ads
>and NoTaglines --> LYCOS MAIL PLUS.

Supercharge your e-mail with a 25MB Inbox, POP3 Access, No Ads
and NoTaglines --> LYCOS MAIL PLUS.

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message