nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1987) Make bin/crawl indexer agnostic
Date Wed, 15 Apr 2015 18:14:59 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496621#comment-14496621
] 

ASF GitHub Bot commented on NUTCH-1987:
---------------------------------------

GitHub user MJJoyce opened a pull request:

    https://github.com/apache/nutch/pull/18

    NUTCH-1987 - Make bin/crawl indexer agnostic

    - Add solr.server.url property to nutch-default and set to value
      consistent with URL used in the Nutch Tutorial.
    - Change SOLRURL references to INDEXFLAG for consistency.
    - Update all occurrences of crawl "usage" strings to no longer reference
      solrURL and instead mention an optional string "run_indexer".
    - Update indexer section to no longer set Solr URL property and remove
      Solr references from prints.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MJJoyce/nutch NUTCH-1987

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nutch/pull/18.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18
    
----
commit a39de23453a6f8ea2a9ab2a94872af3305f16021
Author: Michael Joyce <mltjoyce@gmail.com>
Date:   2015-04-15T17:41:36Z

    NUTCH-1987 - Make bin/crawl indexer agnostic
    
    - Add solr.server.url property to nutch-default and set to value
      consistent with URL used in the Nutch Tutorial.
    - Change SOLRURL references to INDEXFLAG for consistency.
    - Update all occurrences of crawl "usage" strings to no longer reference
      solrURL and instead mention an optional string "run_indexer".
    - Update indexer section to no longer set Solr URL property and remove
      Solr references from prints.

----


> Make bin/crawl indexer agnostic
> -------------------------------
>
>                 Key: NUTCH-1987
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1987
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.9
>            Reporter: Michael Joyce
>             Fix For: 1.10
>
>
> The crawl script makes it a bit challenging to use an indexer that isn't Solr. For instance,
when I want to use the indexer-elastic plugin I still need to call the crawler script with
a fake Solr URL otherwise it will skip the indexing step all together.
> {code}
> bin/crawl urls/ crawl/ "http://fakeurl.com:9200" 1
> {code}
> It would be nice to keep configuration for the Solr indexer in the conf files (to mirror
the elastic search indexer conf and others) and to make the indexing parameter simply toggle
whether indexing does or doesn't occur instead of also trying to configure the indexer at
the same time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message