lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tor Henning Ueland <tor.henn...@gmail.com>
Subject Re: Tips on recursive xml-parsing in dataConfig
Date Sat, 19 Jun 2010 20:20:05 GMT
The case changed to not using those xml-files at all, i ended up using
some other datafiles as sources, witch had everything flat, so no
recursion was needed afterall. But thanks for the input! :)

Best regards

On Tue, Jun 8, 2010 at 11:08 AM, Geert-Jan Brits <gbrits@gmail.com> wrote:
> my bad, it looks like XPathEntityProcessor doesn't support relative xpaths.
>
> However, I quickly looked at the Slashdot example (which is pretty good
> actually) at http://wiki.apache.org/solr/DataImportHandler.
> From that I infer that you use only 1 entity per xml-doc. And within that
> entity use multiple field declararations with xpath-attributes to extract
> the values you want.
> So even though your xml-dcoument is nested (like most xml's are) your
> field-declarations are not.
>
> I think your best bet is to read the slashdot example and go from there.
>
> For now, I'm not entirely sure what you want a solr-document to be in your
> example. i.e:
> - 1 solr-document per 1 xml-document (as supplied)
> - or 1 solr-doc per CHAP  per PARA or per SUB?
>
> Once you know that, perhaps coming up with a decent pointer is easier.
>
> HTH,
> Geert-Jan
>
>
> <http://wiki.apache.org/solr/DataImportHandler>
>
> 2010/6/8 Tor Henning Ueland <tor.henning@gmail.com>
>
>> I have tried both to change the datasource per child node to use the
>> parent nodes name, and tried to making the Xpath`s relative, all
>> causing either exceptions telling that Xpath must start with /, or
>> nullpointer exceptions ( nsfgrantsdir document : null).
>>
>> Best regards
>>
>> On Mon, Jun 7, 2010 at 4:12 PM, Geert-Jan Brits <gbrits@gmail.com> wrote:
>> > I'm guessing (I'm not familiar with the xml dataimport handler, but I am
>> > pretty familiar with Xpath)
>> > that your problem lies in having absolute xpath-queries, instead of
>> relative
>> > xpath queries to your parent node.
>> >
>> > e.g: /DOK/TEKST/KAP is absolute ( the prefixed '/' tells it to be). Try
>> > 'KAP' instead.
>> > The same for all xpaths deeper in the tree.
>> >
>> > Geert-Jan
>> >
>> > 2010/6/7 Tor Henning Ueland <tor.henning@gmail.com>
>> >
>> >> Hi,
>> >>
>> >> I am doing some testing of dataimport to Solr from XML-documents with
>> >> many children in the children. To parse the children i some levels
>> >> down using Xpath goes fine, but the speed is very slow. (~1 minute per
>> >> document, on a quad Xeon server). When i do the same using the format
>> >> solr wants it, the parsing time is 0.02 seconds per document.
>> >>
>> >> I have published a quick example here:
>> >> http://pastebin.com/adhcEvRx
>> >>
>> >> My question is:
>> >>
>> >> I hope that i have done something wrong in the child-parsing  (as you
>> >> can see, it goes down quite a few levels). Can anybody point me in the
>> >> right direction so i can speed up the process?  I have been looking
>> >> around for some examples, but nobody gives examples of such deep data
>> >> indexing.
>> >>
>> >> PS: I know there are some bugs in the Xpath naming etc, but it is just
>> >> a rough example :)
>> >>
>> >> --
>> >> Best regars
>> >> Tor Henning Ueland
>> >>
>> >
>>
>>
>>
>> --
>> Mvh
>> Tor Henning Ueland
>>
>



-- 
Mvh
Tor Henning Ueland

Mime
View raw message