nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1832) Make Nutch work without an indexer
Date Thu, 04 Sep 2014 15:38:51 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121467#comment-14121467
] 

Julien Nioche commented on NUTCH-1832:
--------------------------------------

bq. Pointing at Doug's Crawl.java class isn't what I was stating. Here is what I was stating.

I was pointing at the Crawl class to show that indexing has always been part of the behaviour
in response to your statement that Nutch did not require indexing before.

bq. Old use case: 1. Download Nutch out of the Box. Don't need Solr. I can simply "start Nutch"
and its fetching, and if there was no Solr URL specified, it would go through a full crawl
(this was when ./bin/nutch crawl did something - which it doesn't anymore - it tells you to
use the crawl script). That is a change. New use case: 1. Download Nutch out of the Box. Can't
run Nutch crawling (with the ./bin/crawl command, the only option since ./bin/nutch crawl
doesn't exist anymore). That is a change

The change was not to make the indexing mandatory (it already was) but to remove the old indexing
mechanism and delegate to SOLR. That was more than 3 years ago in Nutch 1.3, see in particular
NUTCH-837 and [https://issues.apache.org/jira/browse/NUTCH-837?focusedCommentId=12884731&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12884731]
where you were part of the discussion and agreed that this was the right thing to do.

Again, the crawl script does *exactly* the same thing as the Crawl command.

bq. BTW, are you saying that if I enable indexer-solr, that as long as the system property
for solrUrl isn't set, that it would not require Solr?

Yes. Activating a plugin does not create any dependency if that plugin is not used, which
would be the case if the indexing command is not called. Hence my point that it should be
reinstated as it does not do any harm (and is how it worked so far).





> Make Nutch work without an indexer
> ----------------------------------
>
>                 Key: NUTCH-1832
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1832
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.9
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.10
>
>         Attachments: NUTCH-1832.Mattmann.090314.patch.2.txt, NUTCH-1832.Mattmann.090314.patch.txt
>
>
> Nutch used to work out of the box, without requiring an indexing backend. As of 1.9,
that's not the case anymore (it's possible even before that). Thanks to [~markus17] for pointing
out that this is due to the indexing-solr plugin being enabled by default. We should disable
it by default, so that the regression is removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message