lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: 20180913 - Clarification about Limitation
Date Thu, 13 Sep 2018 12:59:23 GMT
On 9/13/2018 2:07 AM, Rekha wrote:
> Hi Solr Team,
> I am new to SOLR. I need following clarification from you.
> 		How many documents can be stored in one core?			Is there any limit for number of fields
per document?			How many Core&rsquo;s can be created in on SOLR?			Is there any other
limitation is there based on the Disk storage size? I mean some of the database has the 10
GM limit, I have asked like that.			Can we use SOLR as a database?	

You *can* use Solr as a database, but I wouldn't.  It's not designed for 
that role.  Actual database software is better for that.  If all you 
need is simple data storage, Solr can handle that, but as soon as you 
start talking about complex operations like JOIN, a real database is FAR 
better.  Solr is a search engine, and in my opinion, that's what it 
should be used for.

The only HARD limit that Solr has is actually a Lucene limit.  Lucene 
uses the java "int" type for its internal document ID.  Which means that 
the absolute maximum number of documents in one Solr core is 
2147483647.  That's a little over two billion.  You're likely to have 
scalability problems long before you reach this number, though.  Also, 
this number includes deleted documents, so it's not a good idea to 
actually get close to the limit.  One rough rule of thumb that sometimes 
gets used:  If you have more than one hundred million documents in a 
single core, you PROBABLY need to think about re-designing your setup.

Using a sharded index (which SolrCloud can do a lot easier than 
standalone Solr) removes the two billion document limitation for an 
index -- by spreading the index across multiple Solr cores.

As for storage, you should have enough disk space available so that your 
index data can triple in size temporarily.  This is not a joke -- that's 
really the recommendation.  The way that Lucene operates requires that 
you have at least *double* capacity, but there are real world situations 
in which the index can triple in size.

Running with really big indexes means that you also need a lot of 
memory.  Good performance with Solr requires that the operating system 
has enough memory to effectively cache the often-used parts of the index.

https://wiki.apache.org/solr/SolrPerformanceProblems#RAM

Thanks,
Shawn


Mime
View raw message