nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <j...@apache.org>
Subject [jira] Updated: (NUTCH-664) Possibility to update already stored documents.
Date Wed, 26 Nov 2008 09:34:44 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrzej Bialecki  updated NUTCH-664:
------------------------------------

      Priority: Minor  (was: Major)
    Issue Type: Wish  (was: New Feature)

There is no proposed design, so this is a Wish.

> Possibility to update already stored documents.
> -----------------------------------------------
>
>                 Key: NUTCH-664
>                 URL: https://issues.apache.org/jira/browse/NUTCH-664
>             Project: Nutch
>          Issue Type: Wish
>            Reporter: Sergey Khilkov
>            Priority: Minor
>
> We have huge index of stored documents. It is high cost procedure to fetch page, merge
indexes any time we update some information about page. The information can be changed 1-3
times per day. At this moment we have to store changed info in database, but in this case
we have lots of problems with sorting, search restricions and so on. Lucene itself allows
delete single document and add new one into existing index. But there is a problem with hadoop...
As I understand hadoop filesystem has no possibility to write in random positions. But it
will be great feature if nutch will be able to update created index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message