nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Nioche <lists.digitalpeb...@gmail.com>
Subject Re: Renovating "Nutch Hadoop Tutorial" wiki page
Date Wed, 22 Jan 2014 11:53:32 GMT
Thanks Tejas!


On 22 January 2014 11:51, Tejas Patil <tejas.patil.cs@gmail.com> wrote:

> Moved the old nutchhadooptutorial page from Nutch wiki "Front page" to
> "Archive and Legacy".
>
> ~tejas
>
>
> On Wed, Jan 22, 2014 at 5:09 PM, Tejas Patil <tejas.patil.cs@gmail.com>wrote:
>
>> Thanks *Julien* for pointing me to new "NutchHadoopSingleNodeTutorial"
>> wiki page [0]. I would soon remove the old nutchhadooptutorial page from
>> wiki.
>>
>> [0] : http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial
>>
>> *@d_k*, there are already tutorials for running Nutch 2.x. See [1] and
>> [2]. Those are not as extensive as the tutorial for 1.x [3] but carry the
>> steps which are different for 2.x. The rest steps after datastore setup are
>> similar - the only difference being in the command params which can be
>> figured out from the usage and so they were not duplicated in those 2.x
>> tutorials to avoid maintenance overhead. Do you think that the 2.x
>> tutorials are inadequate in some regards ?
>>
>> [1] : http://wiki.apache.org/nutch/Nutch2Tutorial
>> [2] : http://wiki.apache.org/nutch/Nutch2Cassandra
>> [3] : http://wiki.apache.org/nutch/NutchTutorial
>>
>> Thanks,
>> Tejas
>>
>>
>> On Wed, Jan 22, 2014 at 2:47 AM, d_k <mail4dk@gmail.com> wrote:
>>
>>> Actually what I would like to see is a Nutch 2.x tutorial at the same
>>> level of detail as the http://wiki.apache.org/nutch/NutchHadoopTutorial
>>> What is the process of contributing to that wiki page?
>>>
>>>
>>> On Tue, Jan 21, 2014 at 9:33 PM, Julien Nioche <
>>> lists.digitalpebble@gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> The whole thing has been replaced with
>>>>  http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial<http://wiki.apache.org/nutch/NutchHadoopSingleNodeTutorial>which
does exactly what you described. +1 to remove the old
>>>> nutchhadooptutorial page
>>>>
>>>> J.
>>>>
>>>>
>>>> On 21 January 2014 17:44, Tejas Patil <tejas.patil.cs@gmail.com> wrote:
>>>>
>>>>> Hi nutch-dev,
>>>>>
>>>>> I was looking at [0] and realized that with the massive number of
>>>>> Hadoop setup tutorials out there on internet, we need not repeat the
same
>>>>> on nutch wiki page and instead assume that user has already done Hadoop
>>>>> setup. For convinience, we could direct users to the Hadoop wiki page
which
>>>>> has Hadoop setup details.
>>>>> Plus, I propose following:
>>>>>
>>>>> - Section "Downloading Hadoop and Nutch" : Remove the Hadoop portions
>>>>> and let the Nutch stuff stay.
>>>>> - Section "Setting Up The Deployment Architecture" must be removed.
>>>>> - Section "Deploy Nutch to Single Machine" and "Deploy Nutch to
>>>>> Multiple Machines" can be merged together.
>>>>> - Section "Performing a Nutch Crawl", "Testing the Crawl" and
>>>>> "Performing a Search" must be merged, its contents must be updated.
>>>>> - Section "Rsyncing Code to Slaves" and "Updates" can be completely
>>>>> removed.
>>>>>
>>>>> Any comments ?
>>>>>
>>>>> [0] : http://wiki.apache.org/nutch/NutchHadoopTutorial
>>>>>
>>>>> Thanks,
>>>>> Tejas
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Open Source Solutions for Text Engineering
>>>>
>>>> http://digitalpebble.blogspot.com/
>>>> http://www.digitalpebble.com
>>>> http://twitter.com/digitalpebble
>>>>
>>>
>>>
>>
>


-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Mime
View raw message