lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Perry <>
Subject Build index from Oracle, adding fields
Date Thu, 26 Mar 2015 23:19:39 GMT

I have looked and cannot see any clear answers to this on
the Interwebs.

I have an index with, say, 10 fields.

I load that index directly from Oracle - data-config.xml using
JDBC.  I can load 10 million rows very quickly.  This direct
way of loading from Oracle straight into SOLR is fantastic -
really efficient and saves writing loads of import/export code
(e.g. via a CSV file).

Of those 10 fields - two of them (set to multiValued) come from
a separate table and there are anything from 1 to 10 rows per
row from the main table.

I can use a nested entity to extract the child rows for each of
the 10m rows in the main table - but then SOLR generates 10m
separate SQL calls - and the load time goes from a few minutes
to several days.

On smaller tables - just a few thousand rows - I can use a
second nested entity with a JDBC call - but not for very large

Could I load the data in two steps:
1)  load the main 10m rows
2)  load into the existing index by adding the data from a
     second SQL call into fields for each existing row (i.e.
     an UPDATE instead of an INSERT).

I don't know what syntax/option might achieve that.  There
is incremental loading - but I think that replaces whole rows
rather then updating individual fields.  Or maybe it does
do both?

Any other techniques that would be fast/efficient?



View raw message