nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gal Nitzan" <gal.nitza...@gmail.com>
Subject RE: Lock file problems...
Date Thu, 07 Jun 2007 16:52:02 GMT
I index directly to Solr.
It happened to me while 2 separate indexers accessed it directly. It seemed 
that the Lucene index stayed hung (that's why the lock exists) until I killed 
the process. After that I had to re-build the index, since I was afraid it got 
corrupted.

> -----Original Message-----
> From: Briggs [mailto:acidbriggs@gmail.com]
> Sent: Thursday, June 07, 2007 6:21 PM
> To: nutch-dev@lucene.apache.org
> Subject: Lock file problems...
>
> I am getting these lock file errors all over the place when indexing
> or even creating crawldbs.  It doesn't happen all the time, but
> sometimes it happens continuously.  So, I am not quite sure how these
> locks are getting in there, or why they aren't getting removed.
>
> I am not sure where to go from here.
>
> My current application is designed for crawling individual domains.
> So, I have multiple custom crawlers that work concurrently.  Each one
> basically does:
>
> 1) fetch
> 2) invert links
> 3) segment merge
> 4) index
> 5) deduplicate
> 6) merge indexes
>
>
> Though, I am still not 100% sure of what the "indexes" directory is truly
> for.
>
>
>
>
> java.io.IOException: Lock obtain timed out:
> Lock@file:/crawloutput/http$~~www.camlawblog.com/indexes/part-
> 00000/write.lock
>         at org.apache.lucene.store.Lock.obtain(Lock.java:69)
>         at
> org.apache.lucene.index.IndexReader.aquireWriteLock(IndexReader.java:526)
>         at
> org.apache.lucene.index.IndexReader.deleteDocument(IndexReader.java:551)
>         at
> org.apache.nutch.indexer.DeleteDuplicates.reduce(DeleteDuplicates.java:414
> )
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:313)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:155)
>
>
> So, has anyone seen this come up on their own implementations?



Mime
View raw message