lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: data import
Date Fri, 20 Mar 2015 16:04:50 GMT
On 3/19/2015 10:36 PM, Midas A wrote:
> Thanks for replying .. I need clarity on following points
> a) Making store false in schema for few fields will improve indexing time ?

Maybe, maybe not.  If Solr is I/O bound, then it probably would help ...
but usually I/O on the Solr index directory is not the bottleneck.

> b) Does soft commit and hard commit configuration depends on hard ware ?

You need to make your autoCommit and autoSoftCommit intervals as long as
you can stand.  I use autoCommit with a five minute / 25000 document
config, and I don't use autoSoftCommit.  My indexing application sends
explicit soft commits, and those are at least a full minute apart,
sometimes longer.

> c) Should i do merge factor , Rambuffersize configuration ? and how should
> i decide these values ?

The default mergeFactor is 10.  A higher mergeFactor will result in
faster indexing, but queries on the resulting index will be a little bit
slower, unless you optimize after your indexing is complete.  The
default ramBufferSizeMB setting in recent versions is 100, and community
experience has shown that increasing this value doesn't normally make
much difference unless you have enormous documents where each one is a
few megabytes.

> We are doing full indexing and it takes around 4.5 hrs ..(20 M documents )

I would call that a pretty good rate.  One of my single dataimporter
configs will index about 17 million docs into a Solr core in4.5 to 5
hours from MySQL.  By doing several of these in parallel (into separate
shards) on two machines at once, I can re-index my entire 100 million
document database in about 4.5 to 5 hours.

Thanks,
Shawn


Mime
View raw message