nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Moreno Feltscher (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (NUTCH-2495) Use -deleteGone instead of clean job in crawler script while indexing
Date Tue, 23 Jan 2018 17:56:00 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Moreno Feltscher reassigned NUTCH-2495:
---------------------------------------

    Assignee: Lewis John McGibbney  (was: Moreno Feltscher)

> Use -deleteGone instead of clean job in crawler script while indexing
> ---------------------------------------------------------------------
>
>                 Key: NUTCH-2495
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2495
>             Project: Nutch
>          Issue Type: Improvement
>            Reporter: Moreno Feltscher
>            Assignee: Lewis John McGibbney
>            Priority: Major
>
> Instead of running {{bin/nutch clean}} after indexing the documents run {{bin/nutch index}}
with the {{-deleteGone}} flag which instead of just deleting gone and duplicated documents
also deletes redirects from the index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message