lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gora Mohanty <g...@mimirtech.com>
Subject Re: Faster loading to solr...
Date Fri, 01 Oct 2010 03:57:47 GMT
On Thu, Sep 30, 2010 at 10:49 PM, Sharma, Raghvendra
<sraghvendra@corelogic.com> wrote:
> I have been able to load around a million rows/docs in around 5+ minutes.  The schema
contains around 250+ fields.  For the moment, I have kept everything as string.
> I am sure there are ways to get better loading speeds than this.

A million documents with 250 fields in 5 minutes sounds fast to
me. As a comparison, we do a million documents with about 60 fields
in an hour, using multiple Solr cores. However, this is very likely an
apples to oranges comparison, as we are pulling large amounts of
data from a database over a network. What indexing times are you
aiming for?

If you can shard your data, using multiple cores on a single Solr
instance, and/or multiple Solr instances will speed up your indexing.
However, if you want a complete, non-sharded index, you will need
to merge the sharded ones.

> Will the data type matter in loading speeds ?? or anything else ?

Data type might matter if there is a lot of processing involved for
that data type. E.g., the text type has several analyzers and tokenizers.

> Can someone help me with any tips ? perhaps any best practices  kind of document/article..
> Anything ..
[...]

The Solr Wiki has many suggestions, e.g., look at the documentation
on the DataImportHandler. In our experience, XML import has been
very fast. A generic document is difficult as the speed is dependent
on many things, such as the data source, number and type of fields,
size of data, etc. Your best bet is to try out several approaches.

Regards,
Gora

Mime
View raw message