lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noble Paul നോബിള്‍ नोब्ळ् <noble.p...@gmail.com>
Subject Re: DIH using values from solrconfig.xml inside data-config.xml
Date Mon, 02 Feb 2009 17:18:16 GMT
RegexTransformer does not replace the placeholders before processing the regex.
it has to be enhanced



On Mon, Feb 2, 2009 at 10:34 PM, Fergus McMenemie <fergus@twig.me.uk> wrote:
> Hello
>
> As per several postings I noted that I can define variables
> inside an invariants list section of the DIH handler of
> solrconfig.xml:-
>
>  <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
>    <lst name="defaults">
>       <str name="config">data-config.xml</str>
>       </lst>
>    <lst name="invariants">
>       <str name="finstalldir">/Volumes/spare/ts</str>
>       </lst>
>    </requestHandler>
>
>
> I can also reference these variables within data-config.xml. This
> works,  the solr field "test" is nicely populated. However how do
> I use this variable within my regex transformer? Here is my
> data-config.xml:-
>
>   <dataConfig>
>   <dataSource name="myfilereader" type="FileDataSource"/>
>    <document>
>       <entity name="jc"
>               processor="FileListEntityProcessor"
>               fileName="^.*\.xml$"
>               newerThan="'NOW-1000DAYS'"
>               recursive="true"
>               rootEntity="false"
>               dataSource="null"
>               baseDir="/Volumes/spare/ts/fords/dtd/fordsxml/data">
>          <entity name="x"
>                  dataSource="myfilereader"
>                  processor="XPathEntityProcessor"
>                  url="${jc.fileAbsolutePath}"
>                  stream="false"
>                  forEach="/record"
>                  transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer">
>
>   <field column="fileAbsolutePath" template="${jc.fileAbsolutePath}" />
>   <field column="fileWebPath"      regex="${dataimporter.request.finstalldir}(.*)"
replaceWith="$1" sourceColName="fileAbsolutePath"/>
>   <field column="test"             template="${dataimporter.request.finstalldir}"
/>
>   <field column="title"            xpath="/record/title" />
>   <field column="para"             xpath="/record/sect1/para" stripHTML="true" />
>   <field column="date"             xpath="/record/metadata/date[@qualifier='Date']"
dateTimeFormat="yyyyMMdd"   />
>             </entity>
>       </entity>
>       </document>
>    </dataConfig>
>
> indexing my content I get an error as follows:-
>
>
> INFO: SolrDeletionPolicy.onInit: commits:num=2
>        commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_7,version=1233583868834,generation=7,filenames=[_7.frq,
_4.fdt, _7.tii, _7.fnm, _4.fdx, _7.tis, segments_7, _7.nrm, _7.prx]
>        commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_8,version=1233583868835,generation=8,filenames=[segments_8]
> Feb 2, 2009 5:00:50 PM org.apache.solr.core.SolrDeletionPolicy updateCommits
> INFO: last commit = 1233583868835
> Feb 2, 2009 5:00:57 PM org.apache.solr.handler.dataimport.EntityProcessorBase applyTransformer
> WARNING: transformer threw error
> java.util.regex.PatternSyntaxException: Illegal repetition near index 0
> ${dataimporter.request.finstalldir}(.*)
> ^
>        at java.util.regex.Pattern.error(Pattern.java:1650)
>        at java.util.regex.Pattern.closure(Pattern.java:2706)
>        at java.util.regex.Pattern.sequence(Pattern.java:1798)
>        at java.util.regex.Pattern.expr(Pattern.java:1687)
>        at java.util.regex.Pattern.compile(Pattern.java:1397)
>        at java.util.regex.Pattern.<init>(Pattern.java:1124)
>        at java.util.regex.Pattern.compile(Pattern.java:817)
>        at org.apache.solr.handler.dataimport.RegexTransformer.getPattern(RegexTransformer.java:129)
>        at org.apache.solr.handler.dataimport.RegexTransformer.process(RegexTransformer.java:88)
>        at org.apache.solr.handler.dataimport.RegexTransformer.transformRow(RegexTransformer.java:74)
>        at org.apache.solr.handler.dataimport.RegexTransformer.transformRow(RegexTransformer.java:42)
>        at org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187)
>        at org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197)
>        at org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
>        at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:333)
>        at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:359)
>        at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:222)
>        at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:155)
>        at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:324)
>        at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:384)
>        at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:365)
>
>
> Is there some simple escape or other syntax to be used or is
> this an enhancement?
>
> Regards Fergus.
> --
>
> ===============================================================
> Fergus McMenemie               Email:fergus@twig.me.uk
> Techmore Ltd                   Phone:(UK) 07721 376021
>
> Unix/Mac/Intranets             Analyst Programmer
> ===============================================================
>



-- 
--Noble Paul

Mime
View raw message