nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alfonso Nishikawa <alfonso.nishik...@gmail.com>
Subject Re: Where happens the inject of Redirects and outlinks?
Date Thu, 20 Nov 2014 14:12:15 GMT
Hi,

I found it in updatedb. Exactly here:
svn.apache.org/viewvc/nutch/branches/2.x/src/java/org/apache/nutch/crawl/DbUpdateReducer.java?view=markup#l207

What was happening is that I had db.update.additions.allowed=false and it
filters out too the redirects :\ (error after upgrading to 2.3-SNAPSHOT
from 2.0).
In my thoughts, redirects should not be the same as outlinks... :\

Anyway, solved :)

Thanks!

Alfonso






2014-11-19 16:34 GMT+01:00 Alfonso Nishikawa <alfonso.nishikawa@gmail.com>:

> Hi, Lewis,
>
> For Nutch 2.3-SNAPSHOT (in 2.x branch if I am not wrong).
>
> Many thanks! :)
>
> Alfonso
>
>
> > Hi Alfonso,
> >
> > On Tue, Nov 18, 2014 at 9:27 AM, <dev-digest-help@nutch.apache.org> wrote:
> >
> > >
> > > I am getting mad searching in plugins and everywhere :( surely someone
> > > here can just point me in a second a Class or a folder (that would be
> > > enough).
> > >
> >
> > For which codebase?
> > Thanks
> > Lewis
>
>
> 2014-11-18 18:26 GMT+01:00 Alfonso Nishikawa <alfonso.nishikawa@gmail.com>
> :
>
>> Hi,
>>
>> After https://issues.apache.org/jira/browse/NUTCH-1448:"Redirected urls
>> should be handled more cleanly (more like an outlink url)" the redirects
>> are treated as outlinks. Where does that outlinks get injected again in the
>> webpage( (and specifically the redirects, although there is not difference).
>>
>> I am getting mad searching in plugins and everywhere :( surely someone
>> here can just point me in a second a Class or a folder (that would be
>> enough).
>>
>> Thanks!
>>
>> Alfonso
>>
>
>

Mime
View raw message