incubator-droids-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertil Chapuis <cont...@bertil.ch>
Subject Re: Customizable Solr Handle
Date Thu, 24 Sep 2009 14:32:28 GMT
Of course, I agree, this should be improved. I did that because I wanted
something which use a sax approach instead of xpath. I think it make
sense since xpath requires to build a DOM from the page.

In my case, i finally used an other solution which is a bit different.
In the handler, I use an embedded pipeline (cocoon 3) to apply a
sequence of XSLT transformations on the parse. The goal is to obtain an
XML document in the form:

 <doc>
  <field name="fieldname">value</field>
  ...	
 </doc>

eventually, a custom cocoon consumer sends the values to solr and commit
the result.

I didn't sent the code because the approach was a bit strange. However
in my case i works well and i can use it to handle nearly everything.

What do you think about such a solution? I have a little time next week
so I should be able to provide something more decent.

Best regards,

Bertil




On Thu, 2009-09-24 at 11:19 +0200, Thorsten Scherler wrote:
> On Wed, 2009-09-09 at 10:38 +0200, Bertil Chapuis wrote:
> > Hello,
> > 
> > My name is Bertil Chapuis. I am using droids for a personal project and
> > I am trying to create a more customizable solr handler. 
> > 
> > I posted a ticket with my code (DROIDS-62). However, I am looking for a
> > way to filter the handler's execution. I'd like to handle the documents
> > only if their URI or content matches specific conditions.
> > 
> > For example, the document is handled only if its uri matches the
> > following regex:
> > 
> > http://www.awebsite.com/document-[0-9]*.htm
> > 
> > What's the best way to do that? 
> 
> I had a chance to test this patch but in the end I could not use it for
> my use case. The problem that I have with it it that is limiting the
> access to the different elements in the tree to much. It is not generic
> since instead of using xpath expression (the standard approach to solve
> such a usecase) it uses "standard regexp". 
> 
> Further having a strong background on xml myself it stroke me ought to
> have element[0] which in xpath would have been element[1].
> 
> IMO if you can add xpath support to this component then it really rocks
> for many usecases since we would have a generic parser solution to
> extract informations the way it is now it will be for very few use
> cases.
> 
> salu2
> 
> > Is it delegated to the handler's
> > implementation or is there a standard way?
> > 
> > Best regards,
> > 
> > Bertil Chapuis
> > 
> > 


Mime
View raw message