lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noble Paul നോബിള്‍ नोब्ळ् <>
Subject Re: Problems with DIH XPath flatten
Date Wed, 07 Oct 2009 03:59:06 GMT
send a small sample xml snippet you are trying to index and it may help

On Tue, Oct 6, 2009 at 9:29 PM, Adam Foltzer <> wrote:
> Hi all,
> I'm trying to set up DataImportHandler to index some XML documents available
> over web services. The XML includes both content and metadata, so for the
> indexable content, I'm trying to just index everything under the content
> tag:
> <entity dataSource="kbws" name="kbxml" pk="title"
>        url="resturl" processor="XPathEntityProcessor"
>        forEach="/document" transformer="HTMLStripTransformer"
> flatten="true">
> <field column="content" name="content" xpath="/document/kbml/body"
> flatten="true" stripHTML="true" />
> <field column="title" name="title" xpath="/document/kbml/kbq" />
> </entity>
> The result of this is that the title field gets populated and indexed (there
> are no child nodes of /document/kbml/kbq), but content does not get indexed
> at all. Since /document/kbml/body has many children, I expected that
> flatten="true" would store all of the body text in the field. Instead, it
> stores nothing at all. I've tried this with many combinations of
> transformers and flatten options, and the result is the same each time.
> Here are the relevant field declarations from the schema (the type="text" is
> just the one from the example's schema.xml). I have tried combinations here
> as well of stored= and multiValued=, with the same result each time.
> <field name="title" type="text" indexed="true" stored="true"
> multiValued="true" />
> <field name="content" type="text" indexed="true" stored="true"
> multiValued="true" />
> If it would help troubleshooting, I could send along some sample XML. I
> don't want to spam the list with an attachment unless it's necessary, though
> :)
> Thanks in advance for your help,
> Adam Foltzer

Noble Paul | Principal Engineer| AOL |

View raw message