lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Hill <jayallenh...@gmail.com>
Subject DIH: Setting rows= on full-import has no effect
Date Fri, 09 Oct 2009 00:40:33 GMT
In the past setting rows=n with the full-import command has stopped the DIH
importing at the number I passed in, but now this doesn't seem to be
working. Here is the command I'm using:
curl '
http://localhost:8983/solr/indexer/mediawiki?command=full-import&rows=100'

But when 100 docs are imported the process keeps running. Here's the log
output:

Oct 8, 2009 5:23:32 PM org.apache.solr.handler.dataimport.DocBuilder
buildDocument
INFO: Indexing stopped at docCount = 100
Oct 8, 2009 5:23:33 PM org.apache.solr.handler.dataimport.DocBuilder
buildDocument
INFO: Indexing stopped at docCount = 200
Oct 8, 2009 5:23:35 PM org.apache.solr.handler.dataimport.DocBuilder
buildDocument
INFO: Indexing stopped at docCount = 300
Oct 8, 2009 5:23:36 PM org.apache.solr.handler.dataimport.DocBuilder
buildDocument
INFO: Indexing stopped at docCount = 400
Oct 8, 2009 5:23:38 PM org.apache.solr.handler.dataimport.DocBuilder
buildDocument
INFO: Indexing stopped at docCount = 500

and so on.

Running on the most recent nightly: 1.4-dev 823366M - jayhill - 2009-10-08
17:31:22

I've used that exact url in the past and the indexing stopped at the rows
number as expected, but I haven't run the command for about two months on a
build from back in early July.

Here's the dih config:

 <dataConfig>
    <dataSource
       name="dsFiles"
       type="FileDataSource"
       encoding="UTF-8"/>
    <document>
      <entity
     name="f"
     processor="FileListEntityProcessor"
     baseDir="/path/to/files"
     fileName=".*xml"
     recursive="true"
     rootEntity="false"
     dataSource="null">

    <entity
       name="wikixml"
       processor="XPathEntityProcessor"
       forEach="/mediawiki/page"
       url="${f.fileAbsolutePath}"
       dataSource="dsFiles"
       onError="skip"
       >
      <field column="id" xpath="/mediawiki/page/id"/>
      <field column="title" xpath="/mediawiki/page/title"/>
      <field column="contributor"
xpath="/mediawiki/page/revision/contributor/username"/>
      <field column="comment" xpath="/mediawiki/page/revision/comment"/>
      <field column="text" xpath="/mediawiki/page/revision/text"/>

        </entity>
      </entity>
    </document>
</dataConfig>


-Jay

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message