lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dyer, James" <James.D...@ingramcontent.com>
Subject RE: solr-dih does multiple queries for sub-entities
Date Mon, 04 Mar 2013 15:15:09 GMT
You can cache the subentity, then it will retrieve all the data for that entity in 1 query.
 

See http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor for more information.
 This section focuses on caching data from SQLEntityProcessor.  However, it is now possible
to cache data from other entity types also.  Also, it is possible to plug in cache implementations
if the default in-memory cache does not scale for you.  See https://issues.apache.org/jira/browse/SOLR-2382
.

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: harpax [mailto:a.psczolla@pan-sonic.de] 
Sent: Monday, March 04, 2013 8:49 AM
To: solr-user@lucene.apache.org
Subject: solr-dih does multiple queries for sub-entities

Hi,

I am trying to use the DIH for crawling over some xml-files and xpathing
them and then access a db with the filename as a key. That works, but
reading ~30.000 docs would take almost 3h. When I looked at the
DIH-Debug-console it showed me, that way to many db-calls were made: 1 for
the 1st doc, then 2, 3, 4, ..

I tried different attributes combinations (eg stripped it to the minimum),
but still the same. 

This problem was asked before:
http://lucene.472066.n3.nabble.com/DIH-multiple-queries-per-sub-entity-tt701038.html

thanks a lot!

regards
Arne

--
<?xml version="1.0" encoding="UTF-8"?>
<dataConfig>
    <dataSource 
        name="cr-db"
        jndiName="xyz"
        type="JdbcDataSource" />
    <dataSource 
        name="cr-xml" 
        type="FileDataSource" 
        encoding="utf-8" />


    <document name="doc">
        <entity 
            dataSource="cr-xml" 
            name="f" 
            processor="FileListEntityProcessor" 
            baseDir="/path/to/xml" 
            filename="*.xml" 
            recursive="true" 
            rootEntity="true" 
            onError="skip">
            <entity
                name="xml-data" 
                dataSource="cr-xml" 
                processor="XPathEntityProcessor" 
                forEach="/root" 
                url="${f.fileAbsolutePath}" 
                transformer="DateFormatTransformer" 
                onError="skip">
                <field column="id" xpath="/root/id" /> 

                <field column="A" xpath="/root/a" />
            </entity>

            <entity 
                name="db-data" 
                dataSource="cr-db"
                query="
                    SELECT  
                        id, b
                    FROM 
                        a_table
                    WHERE 
                        id = '${f.file}'">
                <field column="B" name="b" />
            </entity>
        </entity>
    </document>
</dataConfig>
--





--
View this message in context: http://lucene.472066.n3.nabble.com/solr-dih-does-multiple-queries-for-sub-entities-tp4044522.html
Sent from the Solr - User mailing list archive at Nabble.com.



Mime
View raw message