nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From MyD <myd.ro...@googlemail.com>
Subject Injecting URLs and define Inlink?
Date Fri, 08 Jan 2010 03:12:37 GMT
Dear Nutch developers:

Is there any way to inject URLs and define the inlink for those URLs? How
and where can I find the inlink from a certain URL?

Example:

We inject a URL www.example.com/john_doe. We start the crawl and maybe we
are crawling the URL www.example.com/john_doe4.

*=> www.example.com/john_doe*
==> www.example.com/john_doe1
====> www.example.com/john_doe4
==> www.example.com/john_doe2
====> www.example.com/john_doe5
==> www.example.com/john_doe3
===>www.example.com/john_doe6

Is there any way to find the base (inlink) URL www.example.com/john_doe ???

Thanks in advance.

Cheers,
MyD

Mime
View raw message