nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Kubes <ku...@apache.org>
Subject Re: What is Inlinks
Date Thu, 30 Apr 2009 13:51:33 GMT
Inlinks are the inbound links to a given page.  The anchor text is the 
text used to create the inbound link.  For example say we have two pages 
A and B:

A -> <a href="http://inbound/link">Anchor Text</a> -> B

Here we have a link from A to B using "Anchor Text" as the inbound link 
(anchor) text and "http://inbound/link" as the inbound link.  Inlinks is 
an aggregation of all inbound links to a given page.  So if pages D, E, 
F, and G all point to B, Inlinks would have all the links from A, C, D, 
and E to B.

Inlinks are parsed out of the HTMl during the fetching/parsing process. 
  They are then pulled into other jobs such as the WebGraph tools and 
the indexing process.

Dennis

caezar wrote:
> Thats I understand. But what is this anchors? How these (inlinks) object is
> filled by the system? I suppose it should be some kind of inbound links to
> the page being indexed, found in current database, am I right?
> 
> Marko Bauhardt-3 wrote:
>> the inlinks parameter has a method to get the anchors. And the  
>> AnchorIndexingFilter index these anchor text's.
>>
> 

Mime
View raw message