incubator-droids-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thorsten Scherler <thorsten.scherler....@juntadeandalucia.es>
Subject Re: Customizable Solr Handle
Date Wed, 09 Sep 2009 10:10:32 GMT
On Wed, 2009-09-09 at 10:38 +0200, Bertil Chapuis wrote:
> Hello,
> 
> My name is Bertil Chapuis. I am using droids for a personal project and
> I am trying to create a more customizable solr handler. 

Hi Bertil, nice to have you on this list.

> 
> I posted a ticket with my code (DROIDS-62). However, I am looking for a
> way to filter the handler's execution. I'd like to handle the documents
> only if their URI or content matches specific conditions.

I will have a look at your patch, thanks in advance for your
contribution. 

> 
> For example, the document is handled only if its uri matches the
> following regex:
> 
> http://www.awebsite.com/document-[0-9]*.htm
> 
> What's the best way to do that? Is it delegated to the handler's
> implementation or is there a standard way?

Mingfai has this filter approach theoretically included in our next
version. However right now we do not have a standard approach other then
implementing the validation logic in e.g. the queue. The question is
whether you want only to crawl the pages that are valid against your
regex or the limitation is only for the handler. 

If it is only for the handler then it is maybe best to implement it in
your worker. Something like:
...
public void execute(Link link) throws DroidsException, IOException {

...
URI uri = link.getURI();
Pattern pattern = Pattern.compile(PATTERN);
Matcher matcher = pattern.matcher(uri);
if (matcher.find()) {
  droid.getHandlerFactory().handle(link.getURI(), entity);
}
...}


HTH

salu2

> 
> Best regards,
> 
> Bertil Chapuis
> 
> 
-- 
Thorsten Scherler <thorsten.at.apache.org>
Open Source Java <consulting, training and solutions>

Sociedad Andaluza para el Desarrollo de la Sociedad 
de la InformaciĆ³n, S.A.U. (SADESI)





Mime
View raw message