On Thu, 2009-09-24 at 16:32 +0200, Bertil Chapuis wrote:
> Of course, I agree, this should be improved. I did that because I wanted
> something which use a sax approach instead of xpath. I think it make
> sense since xpath requires to build a DOM from the page.
>
Nupp, that is not true though.
> In my case, i finally used an other solution which is a bit different.
> In the handler, I use an embedded pipeline (cocoon 3) to apply a
> sequence of XSLT transformations on the parse. The goal is to obtain an
> XML document in the form:
>
> <doc>
> <field name="fieldname">value</field>
> ...
> </doc>
jeje, lol you are using cocoon. nice. That is something I can very much
relate to (I am using it since 2001). ;)
>
> eventually, a custom cocoon consumer sends the values to solr and commit
> the result.
>
> I didn't sent the code because the approach was a bit strange. However
> in my case i works well and i can use it to handle nearly everything.
No please I would love to see this code since I actually do the same in
my usecase connecting to solr. You may know
http://wiki.apache.org/solr/SolrForrest which in the end works with
cocoon 2.1 and 2.2.
>
> What do you think about such a solution? I have a little time next week
> so I should be able to provide something more decent.
This would be an awesome contribution. I would love to see it.
salu2
>
> Best regards,
>
> Bertil
>
>
>
>
> On Thu, 2009-09-24 at 11:19 +0200, Thorsten Scherler wrote:
> > On Wed, 2009-09-09 at 10:38 +0200, Bertil Chapuis wrote:
> > > Hello,
> > >
> > > My name is Bertil Chapuis. I am using droids for a personal project and
> > > I am trying to create a more customizable solr handler.
> > >
> > > I posted a ticket with my code (DROIDS-62). However, I am looking for a
> > > way to filter the handler's execution. I'd like to handle the documents
> > > only if their URI or content matches specific conditions.
> > >
> > > For example, the document is handled only if its uri matches the
> > > following regex:
> > >
> > > http://www.awebsite.com/document-[0-9]*.htm
> > >
> > > What's the best way to do that?
> >
> > I had a chance to test this patch but in the end I could not use it for
> > my use case. The problem that I have with it it that is limiting the
> > access to the different elements in the tree to much. It is not generic
> > since instead of using xpath expression (the standard approach to solve
> > such a usecase) it uses "standard regexp".
> >
> > Further having a strong background on xml myself it stroke me ought to
> > have element[0] which in xpath would have been element[1].
> >
> > IMO if you can add xpath support to this component then it really rocks
> > for many usecases since we would have a generic parser solution to
> > extract informations the way it is now it will be for very few use
> > cases.
> >
> > salu2
> >
> > > Is it delegated to the handler's
> > > implementation or is there a standard way?
> > >
> > > Best regards,
> > >
> > > Bertil Chapuis
> > >
> > >
>
--
Thorsten Scherler <thorsten.at.apache.org>
Open Source Java <consulting, training and solutions>
Sociedad Andaluza para el Desarrollo de la Sociedad
de la Información, S.A.U. (SADESI)
|