nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Radim Kolar (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (NUTCH-1194) CrawlDB lock should be released earlier
Date Thu, 03 Nov 2011 22:23:32 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Radim Kolar updated NUTCH-1194:
-------------------------------

    Comment: was deleted

(was: locking should be done in setup/cleanup task. Currently if you kill process submitting
generate job to hadoop then crawl database will stay locked. It needs to be reworked: instead
of running jobs one by one, submit them all at once and make them depends on each other. After
jobs are placed in hadoop queue you can kill client without causing any bad effects.)
    
> CrawlDB lock should be released earlier
> ---------------------------------------
>
>                 Key: NUTCH-1194
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1194
>             Project: Nutch
>          Issue Type: Improvement
>          Components: generator
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.5
>
>
> Lock on the CrawlDB is released when everything is finished. But when generating many
segments, the lock remains in place while it's not neccessary anymore. If GENERATE_UPDATE_DB
is false we can release the lock immediately after the selector has finished.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message