nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "artodeto (JIRA)" <j...@apache.org>
Subject [jira] [Created] (NUTCH-2507) NutchTutorial wiki pages as a lot of outdated command line calls when it starts with the solr interaction
Date Wed, 31 Jan 2018 11:15:00 GMT
artodeto created NUTCH-2507:
-------------------------------

             Summary: NutchTutorial wiki pages as a lot of outdated command line calls when
it starts with the solr interaction
                 Key: NUTCH-2507
                 URL: https://issues.apache.org/jira/browse/NUTCH-2507
             Project: Nutch
          Issue Type: Bug
          Components: documentation
    Affects Versions: 1.14
            Reporter: artodeto


h2. h2. Section "Step-by-Step: Indexing into Apache Solr"

replace:
{code:java}
Example: bin/nutch index http://localhost:8983/solr crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20131108063838/
-filter -normalize -deleteGone{code}
with:
{code:java}
Example: bin/nutch index -Dsolr.server.url=http://localhost:8983/solr/nutch ${NUTCH_RUNTIME_HOME}/crawl
/crawldb/ -linkdb ${NUTCH_RUNTIME_HOME}/crawl
/linkdb/ ${NUTCH_RUNTIME_HOME}/crawl
/segments/20131108063838
/ -filter -normalize -deleteGo{code}
 
h2. Section "Step-by-Step: Deleting Duplicates"

replace:
{code:java}
     Usage: bin/nutch dedup <solr url>
     Example: /bin/nutch dedup http://localhost:8983/solr
{code}
with:
{code:java}
     Usage: bin/nutch dedup <path to the crawldb> <solr url>
     Example: /bin/nutch dedup ${NUTCH_RUNTIME_HOME}/crawl/crawldb/ http://localhost:8983/sol
{code}

h2. Section "Step-by-Step: Cleaning Solr"

replace:
{code:java}
     Usage: bin/nutch clean -Dsolr.server.url=<solr index url> <crawldb>
     Example: /bin/nutch clean -Dsolr.server.url=http://localhost:8983/solr/nutch crawl/crawldb/
{code}
with:
{code}
     Usage: bin/nutch clean -Dsolr.server.url=<solr index url> <crawldb>
     Example: /bin/nutch clean -Dsolr.server.url=http://localhost:8983/solr/nutch ${NUTCH_RUNTIME_HOME}/crawl/crawldb/
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message