lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Frequent deletions
Date Tue, 13 Jan 2015 08:17:35 GMT
On 1/13/2015 12:10 AM, ig01 wrote:
> Unfortunately this is the case, we do have hundreds of millions of documents
> on one 
> Solr instance/server. All our configs and schema are with default
> configurations. Our index
> size is 180G, does that mean that we need at least 180G heap size?

If you have hundreds of millions of documents and the index is only
180GB, they must be REALLY tiny documents.

The number of documents has a lot more impact on the heap requirements
than the index size on disk.  As described in my previous email, I have
about 130GB of total index on my dev Solr server, and the heap is only
7GB.  Everything I ask that machine to do, which includes optimizing
shards that are up to 20GB each, works flawlessly.

When a Solr index has 500 million documents, the amount of memory
required to construct a single entry in the filterCache is over 60MB.
The size of the filterCache in the default example config is 512 ...
which means that if that cache ends up fully utilized, that's in the
neighborhood of 30GB of RAM required for just one Solr cache.  The
amount of memory required for the Lucene FieldCache could be insane with
500 million documents, depending on the exact nature of the queries that
you are doing.

The index size on disk has a different tie to memory -- the RAM that is
not allocated to programs is automatically used by the operating system
for caching data on the disk.  If you have plenty of RAM so the OS disk
cache can effectively keep relevant parts of the index in memory,
performance will not suffer.  Anytime Solr must actually ask the disk
for index data, it will be slow.

With 120GB out of the 140GB total allocated to Solr, that leaves 20GB to
cache 180GB of index data.  That's almost certainly not enough.
Although the OS disk cache requirements have no direct correlation with
OOME exceptions, slow performance due to insufficient caching might lead
*indirectly* to OOME, because the slow performance means that it's more
likely you'll have many queries happening at the same time, which will
lead to larger heap requirements.

Thanks,
Shawn


Mime
View raw message