lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "none none" <>
Subject RE: Crash / Recovery Scenario
Date Mon, 08 Jul 2002 19:05:09 GMT
 If you tell me the computer doesn't crash , the only thing is you want to stop safely the
process, well , in this case the Manager will not stop until the task is not complete, because
i am running the Manager as an NT service ,i have a little problem here, you cannot stop a
thread while it is doing I/O operation like recursively scan of a directory, you have to wait
a little bit.
 I see that you are looking for software stability, but software is strictly related with
hardware, you need a good hardware too, think about a RAID structure, 0 or 5 depends, think
about a clustered system.
This depends what you want from your search engine.
Also i think is good focus on have a good cache status, e.g.: if i have a bad error and i
can't recover the index i rebuild it by calling a method that scan all my cache, it is no
great but better than nothing . Also i never had that kind of problem.
Also adopt a multi threaded will improve by 40% the actual speed, you need to merge all the
segments at the end. (i tested just with 2 thread on Win2K).
If you are looking for a search engine like google, there is a lot of work to do, A LOT!!!!
My opinion is to split index and cache on 'n' machine , but the only thing i don't know how
to do it's run a search on multiple index on multiple machine, with sockets will not work,
sockets become really slow with heavy traffic, i was thinking on a Java compatible DLL able
to merge multiple machine as a logical unit.



On Mon, 8 Jul 2002 21:07:32   
 Nader S. Henein wrote:
>brilliant .. I was thinking along the same lines, a new issue that I'm
>facing is just lucene dying on me, in the middle of indexing .. no server
>crash .. nothing .. what do you do if it just stops mid-indexing ?
>-----Original Message-----
>From: none none []
>Sent: Monday, July 08, 2002 8:42 PM
>Subject: Re: Crash / Recovery Scenario
> hi, i do perform the same things as you do, but i do that everytime i got a
>NullPointerException when i try to run a search . If this happen i try to
>reopen the index searcher, if i got an exception here i sleep for 500 ms
>then i try again, after 5 times i generate a servlet exception. Concerning
>the delete of write.lock and commit.lock, i use a manager,what it does is
>execute different kind of operation in blocks, like 100 or 1000.
>Each operation can be:
>1.Delete documents
>2.Add documents
>3.Search document/s
>A combination of this 3 operation allow me to "update" the index with
>searches still running. But there is a problem "versioning", between current
>cache of documents and current version of "INDEXED" documents, during update
>you can search for something that is found in the index but that has been
>updated in the cache, so i have a bounch of documents duplicate during that,
>and at the end i notify using a RMI callback all the clients connected to
>that Manager to re open the index, then i clean up all this duplicate. At
>this stage i have still an error in case the Manager die because i have all
>in memory, but i did a little work around to handle that. My next step is
>make this "transaction" persistent, so i can recovery the previous "status".
>Every time i run an operation as listed above i do a check if "write.lock"
>or "commit.lock" exists, in that case i call the unlock() method, i delete
>them (if the method unlock doesn't), then i optimize the index.
>Until now everything seems to work fine.
>On Mon, 8 Jul 2002 09:40:10
> Nader S. Henein wrote:
>>I'm currently using Lucene to sift through about a million documents, I've
>>written a servlet to do the indexing and the searching, the servlets are
>>through resin, The Crash scenario I'm thinking of is a web server crash (
>>for a million possible reasons ) while the index is being updated or
>>optimized, what I've noticed is the creation of write.lock and commit.lock
>>files witch stop further indexing because the application thinks that the
>>previously scheduled indexer is still running (witch could very well be
>>depending on the size of the update). This is the recovery I have in mind
>>but I think it might be somewhat of a hack, On restart of the web server
>>I've written an Init function that checks for write.lock or commit.lock ,
>>and if either exist it deletes both of them and optimizes the index. Am I
>>forgetting anything ? is this wrong ? is there a Lucene specific way of
>>doing this like running the optimizer with a specific setup.
>>Nader S. Henein
>> , Dubai Internet City
>>Tel. +9714 3911900
>>Fax. +9714 3911915
>>GSM. +9715 05659557
>>To unsubscribe, e-mail:
>>For additional commands, e-mail:
>Supercharge your e-mail with a 25MB Inbox, POP3 Access, No Ads
>and NoTaglines --> LYCOS MAIL PLUS.

Supercharge your e-mail with a 25MB Inbox, POP3 Access, No Ads
and NoTaglines --> LYCOS MAIL PLUS. 

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message