lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1
Date Sat, 09 Apr 2011 12:32:01 GMT
Hi Folks,

does anyone improve DIH XPathRecordReader to deal with nested xpaths?
data-config.xml with
 <entity .. processor="XPathEntityProcessor" ..
  <field column="title" xpath="//body/h1"/>
  <field column="alltext” xpath="//body" flatten="true"/>
and the XML stream contains
will only fill field “alltext” but field “title” will be empty.

This is a known issue from 2009

So three questions: 
1. How to fill a “search over all”-Field without nested xpaths? 
   (schema.xml  <copyField source="*" dest="alltext"/> will not help, because we lose
the original token order)
2. Does anyone try to improve XPathRecordReader to deal with nested xpaths?
3. Does anyone else need this feature?

Best regards

View raw message