lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gora Mohanty <g...@mimirtech.com>
Subject Re: Is there a way to speed up my import
Date Thu, 27 Jun 2013 07:15:56 GMT
On 27 June 2013 12:32, Mysurf Mail <stammailbox@gmail.com> wrote:
>
> I have a relational database model
> This is the basics of my data-config.xml
>
> <entity name="MyMainEntity" pk="pID" query="select ... from [dbo].[TableA]
> inner join TableB on ...">
>  <entity name="Entity1" pk="Id1" query="SELECT [Text] Tag from [Table2]
> where ResourceId = '${MyMainEntity.pId}'"></entity>
>                 <entity name="Entity1" pk="Id2" query="SELECT [Text] Tag
> from [Table2] where ResourceId2 = '${MyMainEntity.pId}'"></entity>
> <entity name="LibraryItem" pk="ResourceId"
> query="select SKU
>  FROM [TableB]
> INNER JOIN ...
> ON ...
>  INNER JOIN ...
> ON ...
> WHERE ... AND ...'">
>  </entity>
> </entity>
>
> Now, this takes a lot of time.
> 10000 rows in the first query and then each other inner entities are
> fetched later (around 10 rows each).
>
> If I use a db profiler I see a the three inner entities query running over
> and over (3 select sentences than again 3 select sentences over and over)
> This is really not efficient.
> And the import can run over 40 hrs ()
> Now,
> What are my options to run it faster .
> 1. Obviously there is an option to flat the tables to one big table - but
> that will create a lot of other side effects.
>     I would really like to avoid that extra effort and run solr on my
> production relational tables.
>     So far it works great out of the box and I am searching here if there
> is a configuration tweak.
> 2. If I will flat the rows that - does the schema.xml need to be change
> too? or the same fields that are multivalued will keep being multivalued.

You have not shared your actual queries, so it is difficult
to tell, but my guess would be that it is the JOINs that
are the bottle-neck rather than the SELECTs. You should
start by:
1. Profile queries from the database back-end to see
    which are taking the most time, and try to simplify
    them.
2. Make sure that relevant database columns are indexed.
    This can make a huge difference, though going overboard
     in indexing all columns might be counter-productive.
3. Use Solr DIH's CachedSqlEntityProcessor:
    http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
4. Measure the time that Solr indexing takes: From your
    description, you seem to be guessing at it.

In general, you should not flatten the records in the
database as that is supposed to be relational data.

Regards,
Gora

Mime
View raw message