incubator-droids-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Rübner <...@tobr.eu>
Subject Re: accepting urls
Date Fri, 18 May 2012 11:15:34 GMT
Hi,

take a look into org.apache.droids.robot.crawler.CrawlingWorker.
All new tasks, retrieved by your parser, gets checked by the 
getFilteredOutlinks method.
If the filters accept your url, they will be added as new tasks.

If you just want accept URLs comming from the specified host, you can 
also use the HostFilter.

Tobias


On 05/16/2012 09:47 PM, Mansour Al Akeel wrote:
> assuming I need to accept urls under http://www.dmoz.org/Arts/ and not go
> anywhere else, in DroidFactory I would do this:
>
> public static URLFiltersFactory createDefaultURLFiltersFactory() {
>          URLFiltersFactory filtersFactory = new URLFiltersFactory();
>          URLFilter defaultURLFilter = new URLFilter() {
>              final private String prefix = "http://www.dmoz.org/Arts/";
>              public String filter(String urlString) {
>                  if (urlString.startsWith(prefix))
>                      return urlString;
>                  return null;
>              }
>          };
>
>          filtersFactory.getMap().put("default", defaultURLFilter);
>          return filtersFactory;
>      }
>
>
> Then would add it to the droid in the unit testing:
>
> private final CrawlingDroid createDroid(final Queue<Link>  queue) {
>          final CrawlingDroid droid = new SysoutCrawlingDroid(queue, null);
>
>          final ProtocolFactory protocolFactory = DroidsFactory
>                  .createDefaultProtocolFactory();
>          droid.setProtocolFactory(protocolFactory);
>
>          URLFiltersFactory filtersFactory = DroidsFactory
>                  .createDefaultURLFiltersFactory();
>          droid.setFiltersFactory(filtersFactory);
>
>          final ParserFactory parserFactory = parserSetup();
>          droid.setParserFactory(parserFactory);
>          return droid;
>      }
>
> @Test
>      public void execute_linkIsParsed() throws DroidsException, IOException,
>              URISyntaxException {
>
>          final Link link = new LinkTask(null, new URI(searchUrl), 1);
>
>          this.instance.execute(link);
>
>          Mockito.verify(htmlParser).parse(Matchers.any(ContentEntity.class),
>                  Matchers.any(Link.class));
>      }
>
>
> However, iterating through the code, doesn't show it's being invoked. Is
> there anything else I need to do to make sure this is being invoked
> properly ??
>

Mime
View raw message