lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geert-Jan Brits <gbr...@gmail.com>
Subject Re: indexing best practices
Date Sun, 18 Jul 2010 14:06:36 GMT
Have you read:
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr

To be short there are only guidelines (see links) no definitive answers.
If you followed the guidelines for improviing indexing speed on a single box
and after having tested various settings indexing is still too slow, you may
want to test the scenario:
1. indexing to several boxes/shards (using round robin or something).
2. copy all created indexes to one box.
3. use indexwriter.addIndexes to merge the indexes.

1/2/3 done on ssd's is of course going to boost performance a lot as well
(on large indexes, bc small ones may fit in disk cache entirely)
<http://wiki.apache.org/lucene-java/ImproveIndexingSpeed>
Hope that helps a bit,
Geert-Jan

2010/7/18 kenf_nc <ken.foster@realestate.com>

>
> No one has done performance analysis? Or has a link to anywhere where it's
> been done?
>
> basically fastest way to get documents into Solr. So many options
> available,
> what's the fastest:
> 1) file import (xml, csv)  vs  DIH  vs POSTing
> 2) number of concurrent clients   1   vs 10 vs 100 ...is there a
> diminishing
> returns number?
>
> I have 16 million small (8 to 10 fields, no large text fields) docs that
> get
> updated monthly and 2.5 million largish (20 to 30 fields, a couple html
> text
> fields) that get updated monthly. It currently takes about 20 hours to do a
> full import. I would like to cut that down as much as possible.
> Thanks,
> Ken
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/indexing-best-practices-tp973274p976313.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message