lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kelly, Frank" <frank.ke...@here.com>
Subject Re: Largest number of indexed documents used by Solr
Date Thu, 05 Apr 2018 19:35:12 GMT
For us we have ~ 350M documents stored using r3.xlarge nodes with 8GB Heap
and about 31GB of RAM

We are using Solr 5.3.1 in a SolrCloud setup (3 collections, each with 3
shards and 3 replicas).

For us lots of RAM memory is not as important as CPU (as the EBS disk we
run on top of 
is quite fast and our memory hit rate is quite low).

Some things that helped
1) Turned off the filter cache (it required too much heap)
2) Set a limit on replication bandwidth (when nodes are recovering they
can tie up a lot of CPU) in particular maxWriteMBPerSec=100
3) Set query timeout to 2 seconds to help kill ³heavy² queries
4) Set preferLocalShards=true to help mitigate when any EC2 nodes are
having a ³noisy neighbor"
5) We implemented our own CloudWatch based monitoring so that when Solr VM
CPU is high (> 90%) we queue up indexing traffic rather than send it to be
indexed.
We found that if you peg Solr CPU for too long replicas can¹t keep up,
they go into recovery, which drives CPU even higher and eventually the
cluster thinks the nodes are ³down² when they repeatedly fail at recovery.
So we really try to manage Solr CPU load (We¹ll probably look to switching
to compute optimized nodes in the future)

Best

-Frank


On 4/3/18, 9:12 PM, "Steven White" <swhite4141@gmail.com> wrote:

>Hi everyone,
>
>I'm about to start a project that requires indexing 36 million records
>using Solr 7.2.1.  Each record range from 500 KB to 0.25 MB where the
>average is 0.1 MB.
>
>Has anyone indexed this number of records?  What are the things I should
>worry about?  And out of curiosity, what is the largest number of records
>that Solr has indexed which is published out there?
>
>Thanks
>
>Steven


Mime
View raw message