nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vijith <vijithkv...@gmail.com>
Subject Re: Some questions regarding NUTCH-1150
Date Sat, 01 Sep 2012 06:05:07 GMT
Thanks a lot Feng. I will try the same...

On Sat, Sep 1, 2012 at 7:36 AM, feng lu <amuseme.lu@gmail.com> wrote:

> Hi  Vijith
>
> it only happen when the fetcher.parse is true and
> fetcher.follow.outlinks.depth is greater than 0. When Two url (A,B)
> direct to same url (C) and that url will fetch twice, maybe i think you can deduplicate
> the url (C) in handleRedirect function in fetcher.java.
>
> On Fri, Aug 31, 2012 at 8:39 PM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
>
>> No hassle Vijith
>>
>> Thank you
>>
>> Lewis
>>
>> On Fri, Aug 31, 2012 at 1:37 PM, Vijith <vijithkv.87@gmail.com> wrote:
>> > I apologize..I was sending to mailing list with out subscribing to it. I
>> > found the reply from Lewis (from archive). I will comment directly on
>> the
>> > issue. Thanks.
>> >
>> >
>> > On Fri, Aug 31, 2012 at 5:59 PM, Vijith <vijithkv.87@gmail.com> wrote:
>> >>
>> >> Hi all,
>> >>
>> >> (Please ignore my previous mail, if any)
>> >>
>> >> I am new to dev... I am working on
>> >> NUTCH-1150...https://issues.apache.org/jira/browse/NUTCH-1150
>> >> I would like to get some directions before I can start... Right now I
>> am
>> >> going through the Fetcher.java code...
>> >>
>> >> I have tried running nutch with a sample site with two different urls
>> >> redirecting to a common resource.
>> >> I could not find any clues, from hadoop.log, where the common resource
>> is
>> >> parsed multiple times.
>> >> Could some one please explain the exact scenario that creates this bug.
>> >>
>> >> And how does this bug relates to NUTCH-1184 ?
>> >>
>> >> --
>> >> Vijith V.
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > . . . . . thanks & regards
>> >
>> > Vijith V.
>> >
>> >
>>
>>
>>
>> --
>> Lewis
>>
>
>
>
> --
> Don't Grow Old, Grow Up... :-)
>



-- 
*. . . . . thanks & regards*
*
*
*Vijith V.*

Mime
View raw message