lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felipe Vinturini <felipe.vintur...@gmail.com>
Subject Re: Using DIH FileListEntityProcessor with SolrCloud
Date Mon, 05 Dec 2016 12:31:15 GMT
Hi *Chris*,

I've never used the DIH, but maybe the "*fileName*" pattern is wrong?
     fileName="*.*xml*"

Should be:
     fileName="**.xml*"

Regards,
*Felipe*.


On Mon, Dec 5, 2016 at 9:43 AM, Chris Rogers <chris.rogers@bodleian.ox.ac.uk
> wrote:

> Hi all,
>
> Just bumping my question again, as doesn’t seem to have been picked up by
> anyone. Any help would be much appreciated.
>
> Chris
>
> On 02/12/2016, 16:36, "Chris Rogers" <chris.rogers@bodleian.ox.ac.uk>
> wrote:
>
>     Hi all,
>
>     A question regarding using the DIH FileListEntityProcessor with
> SolrCloud (solr 6.3.0, zookeeper 3.4.8).
>
>     I get that the config in SolrCloud lives on the Zookeeper node (a
> different server from the solr nodes in my setup).
>
>     With this in mind, where is the baseDir attribute in the
> FileListEntityProcessor config relative to? I’m seeing the config in the
> Solr GUI, and I’ve tried setting it as an absolute path on my Zookeeper
> server, but this doesn’t seem to work… any ideas how this should be setup?
>
>     My DIH config is below:
>
>     <dataConfig>
>       <dataSource type="FileDataSource"/>
>       <document>
>         <!-- this outer processor generates a list of files satisfying the
> conditions
>              specified in the attributes -->
>         <entity name="f" processor="FileListEntityProcessor"
>                 fileName=".*xml"
>                 newerThan="'NOW-5YEARS'"
>                 recursive="true"
>                 rootEntity="false"
>                 dataSource="null"
>                 baseDir="/home/bodl-zoo-svc/files/">
>
>           <!-- this processor extracts content using Xpath from each file
> found -->
>
>           <entity name="tei" processor="XPathEntityProcessor"
>                   forEach="/TEI" url="${f.fileAbsolutePath}"
> transformer="RegexTransformer" >
>             <field column="manuscript_title" name="manuscript_title"
> xpath="/TEI/teiHeader/fileDesc/titleStmt/title"/>
>             <field column="repository" name="repository"
> xpath="/TEI/teiHeader/fileDesc/publicationStmt/publisher"/>
>             <field column="id" name="id" xpath="/TEI/teiHeader/
> fileDesc/sourceDesc/msDesc/msIdentifier/altIdentifier/idno"/>
>           </entity>
>
>         </entity>
>
>       </document>
>     </dataConfig>
>
>
>     This same script worked as expected on a single solr node (i.e. not in
> SolrCloud mode).
>
>     Thanks,
>     Chris
>
>     --
>     Chris Rogers
>     Digital Projects Manager
>     Bodleian Digital Library Systems and Services
>     chris.rogers@bodleian.ox.ac.uk
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message