lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noble Paul നോബിള്‍ नोब्ळ् <noble.p...@corp.aol.com>
Subject Re: DIH: Setting rows= on full-import has no effect
Date Fri, 09 Oct 2009 04:12:12 GMT
I have raised an issue http://issues.apache.org/jira/browse/SOLR-1501

On Fri, Oct 9, 2009 at 6:10 AM, Jay Hill <jayallenhill@gmail.com> wrote:
> In the past setting rows=n with the full-import command has stopped the DIH
> importing at the number I passed in, but now this doesn't seem to be
> working. Here is the command I'm using:
> curl '
> http://localhost:8983/solr/indexer/mediawiki?command=full-import&rows=100'
>
> But when 100 docs are imported the process keeps running. Here's the log
> output:
>
> Oct 8, 2009 5:23:32 PM org.apache.solr.handler.dataimport.DocBuilder
> buildDocument
> INFO: Indexing stopped at docCount = 100
> Oct 8, 2009 5:23:33 PM org.apache.solr.handler.dataimport.DocBuilder
> buildDocument
> INFO: Indexing stopped at docCount = 200
> Oct 8, 2009 5:23:35 PM org.apache.solr.handler.dataimport.DocBuilder
> buildDocument
> INFO: Indexing stopped at docCount = 300
> Oct 8, 2009 5:23:36 PM org.apache.solr.handler.dataimport.DocBuilder
> buildDocument
> INFO: Indexing stopped at docCount = 400
> Oct 8, 2009 5:23:38 PM org.apache.solr.handler.dataimport.DocBuilder
> buildDocument
> INFO: Indexing stopped at docCount = 500
>
> and so on.
>
> Running on the most recent nightly: 1.4-dev 823366M - jayhill - 2009-10-08
> 17:31:22
>
> I've used that exact url in the past and the indexing stopped at the rows
> number as expected, but I haven't run the command for about two months on a
> build from back in early July.
>
> Here's the dih config:
>
>  <dataConfig>
>    <dataSource
>       name="dsFiles"
>       type="FileDataSource"
>       encoding="UTF-8"/>
>    <document>
>      <entity
>     name="f"
>     processor="FileListEntityProcessor"
>     baseDir="/path/to/files"
>     fileName=".*xml"
>     recursive="true"
>     rootEntity="false"
>     dataSource="null">
>
>    <entity
>       name="wikixml"
>       processor="XPathEntityProcessor"
>       forEach="/mediawiki/page"
>       url="${f.fileAbsolutePath}"
>       dataSource="dsFiles"
>       onError="skip"
>       >
>      <field column="id" xpath="/mediawiki/page/id"/>
>      <field column="title" xpath="/mediawiki/page/title"/>
>      <field column="contributor"
> xpath="/mediawiki/page/revision/contributor/username"/>
>      <field column="comment" xpath="/mediawiki/page/revision/comment"/>
>      <field column="text" xpath="/mediawiki/page/revision/text"/>
>
>        </entity>
>      </entity>
>    </document>
> </dataConfig>
>
>
> -Jay
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Mime
View raw message