nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doğacan Güney (JIRA) <j...@apache.org>
Subject [jira] Commented: (NUTCH-664) Possibility to update already stored documents.
Date Thu, 27 Nov 2008 16:44:46 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651379#action_12651379
] 

Doğacan Güney commented on NUTCH-664:
-------------------------------------

This is possible with a hbase-solr/katta/etc combo. I am working on hbase support but it will
wait after 1.0 to go in.

> Possibility to update already stored documents.
> -----------------------------------------------
>
>                 Key: NUTCH-664
>                 URL: https://issues.apache.org/jira/browse/NUTCH-664
>             Project: Nutch
>          Issue Type: Wish
>            Reporter: Sergey Khilkov
>            Priority: Minor
>
> We have huge index of stored documents. It is high cost procedure to fetch page, merge
indexes any time we update some information about page. The information can be changed 1-3
times per day. At this moment we have to store changed info in database, but in this case
we have lots of problems with sorting, search restricions and so on. Lucene itself allows
delete single document and add new one into existing index. But there is a problem with hadoop...
As I understand hadoop filesystem has no possibility to write in random positions. But it
will be great feature if nutch will be able to update created index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message