lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Terminology question: Core vs. Collection vs...
Date Thu, 03 Jan 2013 13:07:55 GMT
Collection is the more modern term and incorporates the fact that the 
collection may be sharded, with each shard on one or more cores, with each 
core being a replica of the other cores within that shard of that 
collection.

Instance is a general term, but is commonly used to refer to a running Solr 
server, each of which can service any number of cores. A sharded collection 
would typically require multiple instances of Solr, each with a shard of the 
collection.

Multiple collections can be supported on a single instance of Solr. They 
don't have to be sharded or replicated. But if they are, each Solr instance 
will have a copy or replica of the data (index) of one shard of each sharded 
collection - to the degree that each collection needs that many shards.

At the API level, you talk to a Solr instance, using a host and port, and 
giving the collection name. Some operations will refer only to the portion 
of a multi-shard collection on that Solr instance, but typically Solr will 
"distribute" the operation, whether it be an update or a query, to all of 
the shards of the named collection. In the case of update, the update will 
be distributed to all replicas as well, but in the case of query only one 
replica of each shard of the collection is needed.

Before SolrCloud we Solr had master and slave and the slaves were replicas 
of the master, but with SolrCloud there is no master and all the replicas of 
the shard are peers, although at any moment of time one of them will be 
considered the "leader" for coordination purposes, but not in the sense that 
it is a master of the other replicas in that shard. A SolrCloud replica is a 
replica of the data, in an abstract sense, for a single shard of a 
collection. A SolrCloud replica is more of an instance of the data/index.

An index exists at two levels: the portion of a collection on a single Solr 
core will have a Lucene index, but collectively the Lucene indexes for the 
shards of a collection can be referred to the index of the collection. Each 
replica is a copy or instance of a portion of the collection's index.

The term slice is sometimes used to refer collectively to all of the 
cores/replicas of a single shard, or sometimes to a single replica as it 
contains only a "slice" of the full collection data.

-- Jack Krupansky

-----Original Message----- 
From: Alexandre Rafalovitch
Sent: Thursday, January 03, 2013 4:42 AM
To: solr-user@lucene.apache.org
Subject: Terminology question: Core vs. Collection vs...

Hello,

I am trying to understand the core Solr terminology. I am looking for
correct rather than loose meaning as I am trying to teach an example that
starts from easy scenario and may scale to multi-core, multi-machine
situation.

Here are the terms that seem to be all overlapping and/or crossing over in
my mind a the moment.

1) Index
2) Core
3) Collection
4) Instance
5) Replica (Replica of _what_?)
6) Others?

I tried looking through documentation, but either there is a terminology
drift or I am having trouble understanding the distinctions.

If anybody has a clear picture in their mind, I would appreciate a
clarification.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book) 


Mime
View raw message