lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: DIH: Enhance XPathRecordReader to deal with //body(FLATTEN=true) and //body/h1
Date Sun, 10 Apr 2011 20:59:46 GMT
There is an option somewhere to use the full XML DOM implementation
for using xpaths. The purpose of the XPathEP is to be as simple and
dumb as possible and handle most cases: RSS feeds and other open
standards.

Search for xsl(optional)

http://wiki.apache.org/solr/DataImportHandler#Configuration_in_data-config.xml-1

On Sat, Apr 9, 2011 at 5:32 AM,  <karsten-solr@gmx.de> wrote:
> Hi Folks,
>
> does anyone improve DIH XPathRecordReader to deal with nested xpaths?
> e.g.
> data-config.xml with
>  <entity .. processor="XPathEntityProcessor" ..
>  <field column="title" xpath="//body/h1"/>
>  <field column="alltext” xpath="//body" flatten="true"/>
> and the XML stream contains
>  /html/body/h1...
> will only fill field “alltext” but field “title” will be empty.
>
> This is a known issue from 2009
> https://issues.apache.org/jira/browse/SOLR-1437#commentauthor_12756469_verbose
>
> So three questions:
> 1. How to fill a “search over all”-Field without nested xpaths?
>   (schema.xml  <copyField source="*" dest="alltext"/> will not help, because
we lose the original token order)
> 2. Does anyone try to improve XPathRecordReader to deal with nested xpaths?
> 3. Does anyone else need this feature?
>
>
> Best regards
>  Karsten
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message