nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1832) Make Nutch work without an indexer
Date Thu, 04 Sep 2014 15:05:51 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14121441#comment-14121441
] 

Chris A. Mattmann commented on NUTCH-1832:
------------------------------------------

{blockquote}
You haven't been left out on any discussion since there has been no change in behavior : indexing
(be it with the old-indexing mechanism or delegating to SOLR) has always been the default
behaviour when using the all-in-one crawl command - which the crawl script replaces. See https://github.com/apache/nutch/blob/5943b9f1d6f17c0c95ca169ac67b7da379d2bef8/src/java/org/apache/nutch/crawl/Crawl.java,
that's Doug's code from 2005. The logic hasn't changed since and the crawl script just replaces
the crawl class but behaves in the same way. (This does not mean that indexing is strictly
necessary as nothing prevents users from using the nutch commands directly in any way they
wanted).
{blockquote}

Pointing at Doug's Crawl.java class isn't what I was stating. Here is what I was stating.

Old use case:

1. Download Nutch out of the Box. Don't need Solr. I can simply "start Nutch" and its fetching,
and if there was no Solr URL specified, it would go through a full crawl (this was when ./bin/nutch
crawl did something - which it doesn't anymore - it tells you to use the crawl script). That
*is* a change.

New use case:

1. Download Nutch out of the Box. Can't run Nutch crawling (with the ./bin/crawl command,
the only option since ./bin/nutch crawl doesn't exist anymore). *That is a change*

{blockquote}
Please ask them to reply to the survey then, it will certainly make it more representative
. Or provide your own statistics and to tell us what the majority of Nutch users do.
{blockquote}

You stated that the majority of Nutch users crawl *and* index. I simply stated that I don't
think Nutch only has 40 users. In fact, like I said, I know it doesn't :) I started by saying
I applaud you and the work you did on it.

{blockquote}
Now the only change to the existing behaviour is the one you introduced with this commit by
removing 'indexer-solr' from 'plugin.includes'. Can you please fix this? Thanks
{blockquote}

No, not until someone addresses my concern (which you haven't) about the change in behavior.
Thanks.

{blockquote}
PS: I find your tone has been quite aggressive recently (e.g. discussion on versioning). Any
particular reason?
{blockquote}

Not really - in particular you seem to be debating everything I suggest. So, please continue
to do so. I'm happy to debate back.


> Make Nutch work without an indexer
> ----------------------------------
>
>                 Key: NUTCH-1832
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1832
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.9
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 1.10
>
>         Attachments: NUTCH-1832.Mattmann.090314.patch.2.txt, NUTCH-1832.Mattmann.090314.patch.txt
>
>
> Nutch used to work out of the box, without requiring an indexing backend. As of 1.9,
that's not the case anymore (it's possible even before that). Thanks to [~markus17] for pointing
out that this is due to the indexing-solr plugin being enabled by default. We should disable
it by default, so that the regression is removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message