nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From feng lu <amuseme...@gmail.com>
Subject Re: Some questions regarding NUTCH-1150
Date Sat, 01 Sep 2012 02:06:16 GMT
Hi  Vijith

it only happen when the fetcher.parse is true and
fetcher.follow.outlinks.depth is greater than 0. When Two url (A,B) direct
to same url (C) and that url will fetch twice, maybe i think you can
deduplicate
the url (C) in handleRedirect function in fetcher.java.

On Fri, Aug 31, 2012 at 8:39 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> No hassle Vijith
>
> Thank you
>
> Lewis
>
> On Fri, Aug 31, 2012 at 1:37 PM, Vijith <vijithkv.87@gmail.com> wrote:
> > I apologize..I was sending to mailing list with out subscribing to it. I
> > found the reply from Lewis (from archive). I will comment directly on the
> > issue. Thanks.
> >
> >
> > On Fri, Aug 31, 2012 at 5:59 PM, Vijith <vijithkv.87@gmail.com> wrote:
> >>
> >> Hi all,
> >>
> >> (Please ignore my previous mail, if any)
> >>
> >> I am new to dev... I am working on
> >> NUTCH-1150...https://issues.apache.org/jira/browse/NUTCH-1150
> >> I would like to get some directions before I can start... Right now I am
> >> going through the Fetcher.java code...
> >>
> >> I have tried running nutch with a sample site with two different urls
> >> redirecting to a common resource.
> >> I could not find any clues, from hadoop.log, where the common resource
> is
> >> parsed multiple times.
> >> Could some one please explain the exact scenario that creates this bug.
> >>
> >> And how does this bug relates to NUTCH-1184 ?
> >>
> >> --
> >> Vijith V.
> >>
> >>
> >
> >
> >
> > --
> > . . . . . thanks & regards
> >
> > Vijith V.
> >
> >
>
>
>
> --
> Lewis
>



-- 
Don't Grow Old, Grow Up... :-)

Mime
View raw message