lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noble Paul നോബിള്‍ नोब्ळ् <noble.p...@gmail.com>
Subject Re: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor
Date Thu, 20 Jun 2013 13:42:22 GMT
it is possible to create two separate root entities . one for full-import
and another for delta. for the delta-import you can skip Cache that way



On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber <
constantin.wolber@medicalcolumbus.de> wrote:

> Hi,
>
> i searched for a solution for quite some time but did not manage to find
> some real hints on how to fix it.
>
>
> I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in a
> tomcat 6 container.
>
> My data import setup is basically the following:
>
> Data-config.xml:
>
> <entity
>         name="article"
>         dataSource="ds1"
>         query="SELECT * FROM article"
>         deltaQuery="SELECT myownid FROM articleHistory WHERE modified_date
> &gt; '${dih.last_index_time}
>         deltaImportQuery="SELECT * FROM article WHERE
> myownid=${dih.delta.myownid}"
>         pk="myownid">
>         <field column="myownid" name="id"/>
>
>         <entity
>                 name="supplier"
>                 dataSource="ds2"
>                 query="SELECT * FROM supplier WHERE status=1"
>                 processor="CachedSqlEntityProcessor"
>                 cacheKey="SUPPLIER_ID"
>                 cacheLookup="article.ARTICLE_SUPPLIER_ID">
>         </entity>
>
>         <entity
>                 name="attributes"
>                 dataSource="ds1"
>                 query="SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+'
> Value:'+ATTRIBUTE_VALUE FROM attributes"
>                 cacheKey="ARTICLE_ID"
>                 cacheLookup="article.myownid"
>                 processor="CachedSqlEntityProcessor">
>         </entity>
> </entity>
>
>
> Ok now for the problem:
>
> At first I tried everything without the Cache. But the full-import took a
> very long time. Because the attributes query is pretty slow compared to the
> rest. As a result I got a processing speed of around 150 Documents/s.
> When switching everything to the CachedSqlEntityProcessor the full import
> processed at the speed of 4000 Documents/s
>
> So full import is running quite fine. Now I wanted to use the delta
> import. When running the delta import I was expecting the ramp up time to
> be about the same as in full import since I need to load the whole table
> supplier and attributes to the cache in the first step. But when looking
> into the log file the weird thing is solr seems to refresh the Cache for
> every single document that is processed. So currently my delta-import is a
> lot slower than the full-import. I even tried to add the deltaImportQuery
> parameter to the entity but it doesn't change the behavior at all (of
> course I know it is not supposed to change anything in the setup I run).
>
> The following solutions would be possible in my opinion:
>
> 1. Is there any way to tell the config to ignore the Cache when running a
> delta import? That would help already because we are talking about the
> maximum of 500 documents changed in 15 minutes compared to over 5 million
> documents in total.
> 2. Get solr to not refresh the cash for every document.
>
> Best Regards
>
> Constantin Wolber
>
>


-- 
-----------------------------------------------------
Noble Paul

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message