nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Earl Cahill <cahi...@yahoo.com>
Subject Re: nutch/mapred tutorial
Date Wed, 07 Sep 2005 08:17:01 GMT
Though, my last email was more about documenting the
whole setup process, it looks like the error I
mentioned was fixed by creating a directory and
putting a urls file in that directory.  It also looks
like the name of the file doesn't matter.  So I made a
myurls directory, put a urls file in there and then
ran

bin/nutch crawl myurls -dir crawl.test -depth 3

But, yeah, would like to put such steps in a tutorial.
 

It looks like the front page got hit, and that's about
it, so there is more to do.

Earl

--- Earl Cahill <cahille@yahoo.com> wrote:

> howdy,
> 
> I have been looking around for a nutch/mapred
> tutorial
> and haven't had much luck.  I found this one
> 
> http://lucene.apache.org/nutch/tutorial.html
> 
> which did help me get a crawl going on trunk, but no
> such luck in branches/mapred.  I set the urls file
> and
> the filter in the same way that I did for trunk and
> I
> get 
> 
> 050907 013817 parsing
>
file:/home/nutch/nutch/branches/mapred/conf/nutch-site.xml
> java.io.IOException: No input files in:
> [Ljava.io.File;@32b0bad7
>         at
>
org.apache.nutch.mapred.InputFormatBase.listFiles(InputFormatBase.java:74)
>         at
>
org.apache.nutch.mapred.InputFormatBase.getSplits(InputFormatBase.java:84)
>         at
>
org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:59)
> 
> Guess I am wondering if a detailed tutorial for
> mapred
> exists.  Seems like doug was saying that it didn't. 
> I
> would be up for walking through getting a crawl
> going
> and documenting my steps, but won't dive in if one
> already exists.  Also wondering if I would/could put
> my doc on the wiki.
> 
> Thanks,
> Earl
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Mime
View raw message