nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Khilkov (JIRA)" <j...@apache.org>
Subject [jira] Created: (NUTCH-664) Possibility to update already stored documents.
Date Wed, 26 Nov 2008 06:30:45 GMT
Possibility to update already stored documents.
-----------------------------------------------

                 Key: NUTCH-664
                 URL: https://issues.apache.org/jira/browse/NUTCH-664
             Project: Nutch
          Issue Type: New Feature
            Reporter: Sergey Khilkov


We have huge index of stored documents. It is high cost procedure to fetch page, merge indexes
any time we update some information about page. The information can be changed 1-3 times per
day. At this moment we have to store changed info in database, but in this case we have lots
of problems with sorting, search restricions and so on. Lucene itself allows delete single
document and add new one into existing index. But there is a problem with hadoop... As I understand
hadoop filesystem has no possibility to write in random positions. But it will be great feature
if nutch will be able to update created index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message