lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan H√łydahl <>
Subject Re: 20180917-Need Apache SOLR support
Date Mon, 17 Sep 2018 14:01:41 GMT
> We are beginners to Apache SOLR, We need following clarifications from you.
> 1.      In SOLRCloud, How can we install more than one Shared on Single PC? 

You typically have one installation of Solr on each server. Then you can add a collection
with multiple shards, specifying how many shards you wish when creating the collection, e.g.

bin/solr create -c mycoll -shards 4

Although possible, it is normally not advised to install multiple instances of Solr on the
same server.

> 2.      How many maximum number of shared can be added under on SOLRCloud?

There is no limit. You should find a good number based on the number of documents, the size
of your data, the number of servers in your cluster, available RAM and disk size and the required

In practice you will guess the initial #shards and then benchmark a few different settings
before you decide.
Note that you can also adjust the number of shards as you go through CREATESHARD / SPLITSHARD
APIs, so even if you start out with few shards you can grow later.

> 3.      In my application there is no need of ACID properties, other than
> this can I use SOLR as a Complete Database?

You COULD, but Solr is not intended to be your primary data store. You should always design
your system so that you can re-index all content from some source (does not need to be a database)
when needed. There are several use cases for a complete re-index that you should consider.

> 4.      In Which OS we can feel the better performance, Windows Server OS /
> Linux?

I'd say Linux if you can. If you HAVE to, then you could also run on Windows :-)

> 5.      If a SOLR Core contains 2 Billion indexes, what is the recommended
> RAM size and Java heap space for better performance? 

It depends. It is not likely that you will ever put 2bn docs in one single core. Normally
you would have sharded long before that number.
The amount of physical RAM and the amount of Java heap to allocate to Solr must be calculated
and decided on a per case basis.
You could also benchmark this - test if a larger RAM size improves performance due to caching.
Depending on your bottlennecks, adding more RAM may be a way to scale further before needing
to add more servers.

Sounds like you should consult with a Solr expert to dive deep into your exact usecase and
architect the optimal setup for your case, if you have these amounts of data.

> 6.      I have 20 fields per document, how many maximum number of documents
> can be inserted / retrieved in a single request?

No limit. But there are practical limits.
For indexing (update), attempt various batch sizes and find which gives the best performance
for you. It is just as important to do inserts (updates) in many parallell connections as
in large batches.

For searching, why would you want to know a maximum? Normally the usecase for search is to
get TOP N docs, not a maximum number?
If you need to retrieve thousands of results, you should have a look at /export handler and/or
streaming expressions.

> 7.       If I have Billions of indexes, If the "start" parameter is 10th
> Million index and "end" parameter is  start+100th index, for this case any
> performance issue will be raised ?

Don't do it!
This is a warning sign that you are using Solr in a wrong way.

If you need to scroll through all docs in the index, have a look at streaming expressions
or cursorMark instead!

> 8.      Which .net client is best for SOLR?

The only I'm aware of is SolrNET. There may be others. None of them are supported by the Solr

> 9.      Is there any limitation for single field, I mean about the size for
> blob data?

I think there is some default cutoff for very large values.

Why would you want to put very large blobs into documents?
This is a warning flag that you may be using the search index in a wrong way. Consider storing
large blobs outside of the search index and reference them from the docs.

In general, it would help a lot if you start telling us WHAT you intend to use Solr for, what
you try to achieve, what performance goals/requirements you have etc, instead of a lot of
very specific max/min questions. There are very seldom hard limits, and if there are, it is
usually not a good idea to approach them :)


View raw message