lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noble Paul നോബിള്‍ नोब्ळ् <noble.p...@gmail.com>
Subject Re: DataImportHandler: Problems with delta-import and CachedSqlEntityProcessor
Date Thu, 20 Jun 2013 14:55:10 GMT
yes. that's right


On Thu, Jun 20, 2013 at 8:16 PM, Constantin Wolber <
constantin.wolber@medicalcolumbus.de> wrote:

> Hi,
>
> i may have been a little to fast with my response.
>
> After reading a bit more I imagine you meant running the full-import with
> the entity param for the root entity for full import. And running the delta
> import with the entity param for the delta entity. Is that correct?
>
> Regards
>
> Constantin
>
>
> -----Ursprüngliche Nachricht-----
> Von: Constantin Wolber [mailto:constantin.wolber@medicalcolumbus.de]
> Gesendet: Donnerstag, 20. Juni 2013 16:42
> An: solr-user@lucene.apache.org
> Betreff: AW: DataImportHandler: Problems with delta-import and
> CachedSqlEntityProcessor
>
> Hi,
>
> and thanks for the answer. But I'm a little bit confused about what you
> are suggesting.
> I did not really use the rootEntity attribute before. But from what I read
> in the documentation as far as I can tell that would result in two
> documents (maybe with the same id which would probably result in only one
> document being stored) because one for each root entity.
>
> It would be great if you could just sketch the setup with the entities I
> provided. Because currently I have no idea on how to do it.
>
> Regards
>
> Constantin
>
>
> -----Ursprüngliche Nachricht-----
> Von: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.paul@gmail.com]
> Gesendet: Donnerstag, 20. Juni 2013 15:42
> An: solr-user@lucene.apache.org
> Betreff: Re: DataImportHandler: Problems with delta-import and
> CachedSqlEntityProcessor
>
> it is possible to create two separate root entities . one for full-import
> and another for delta. for the delta-import you can skip Cache that way
>
>
>
> On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber <
> constantin.wolber@medicalcolumbus.de> wrote:
>
> > Hi,
> >
> > i searched for a solution for quite some time but did not manage to
> > find some real hints on how to fix it.
> >
> >
> > I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in
> > a tomcat 6 container.
> >
> > My data import setup is basically the following:
> >
> > Data-config.xml:
> >
> > <entity
> >         name="article"
> >         dataSource="ds1"
> >         query="SELECT * FROM article"
> >         deltaQuery="SELECT myownid FROM articleHistory WHERE
> > modified_date &gt; '${dih.last_index_time}
> >         deltaImportQuery="SELECT * FROM article WHERE
> > myownid=${dih.delta.myownid}"
> >         pk="myownid">
> >         <field column="myownid" name="id"/>
> >
> >         <entity
> >                 name="supplier"
> >                 dataSource="ds2"
> >                 query="SELECT * FROM supplier WHERE status=1"
> >                 processor="CachedSqlEntityProcessor"
> >                 cacheKey="SUPPLIER_ID"
> >                 cacheLookup="article.ARTICLE_SUPPLIER_ID">
> >         </entity>
> >
> >         <entity
> >                 name="attributes"
> >                 dataSource="ds1"
> >                 query="SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+'
> > Value:'+ATTRIBUTE_VALUE FROM attributes"
> >                 cacheKey="ARTICLE_ID"
> >                 cacheLookup="article.myownid"
> >                 processor="CachedSqlEntityProcessor">
> >         </entity>
> > </entity>
> >
> >
> > Ok now for the problem:
> >
> > At first I tried everything without the Cache. But the full-import
> > took a very long time. Because the attributes query is pretty slow
> > compared to the rest. As a result I got a processing speed of around 150
> Documents/s.
> > When switching everything to the CachedSqlEntityProcessor the full
> > import processed at the speed of 4000 Documents/s
> >
> > So full import is running quite fine. Now I wanted to use the delta
> > import. When running the delta import I was expecting the ramp up time
> > to be about the same as in full import since I need to load the whole
> > table supplier and attributes to the cache in the first step. But when
> > looking into the log file the weird thing is solr seems to refresh the
> > Cache for every single document that is processed. So currently my
> > delta-import is a lot slower than the full-import. I even tried to add
> > the deltaImportQuery parameter to the entity but it doesn't change the
> > behavior at all (of course I know it is not supposed to change anything
> in the setup I run).
> >
> > The following solutions would be possible in my opinion:
> >
> > 1. Is there any way to tell the config to ignore the Cache when
> > running a delta import? That would help already because we are talking
> > about the maximum of 500 documents changed in 15 minutes compared to
> > over 5 million documents in total.
> > 2. Get solr to not refresh the cash for every document.
> >
> > Best Regards
> >
> > Constantin Wolber
> >
> >
>
>
> --
> -----------------------------------------------------
> Noble Paul
>



-- 
-----------------------------------------------------
Noble Paul

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message