lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Terminology question: Core vs. Collection vs...
Date Thu, 03 Jan 2013 13:28:57 GMT
A node is a machine in a cluster or cloud (graph). It could be a real 
machine or a virtualized machine. Technically, you could have multiple 
virtual nodes on the same physical "box". Each Solr replica would be on a 
different node.

Technically, you could have multiple Solr instances running on a single 
hardware node, each with a different port. They are simply instances of 
Solr, although you could consider each Solr instance a node in a Solr cloud 
as well, a "virtual" node. So, technically, you could have multiple replicas 
on the same node, but that sort of defeats most of the purpose of having 
replicas in the first place - to distribute the data for performance and 
fault tolerance. But, you could have replicas of different shards on the 
same node/box for a partial improvement of performance and fault tolerance.

A Solr "cloud' is really a cluster.

-- Jack Krupansky

-----Original Message----- 
From: Darren Govoni
Sent: Thursday, January 03, 2013 8:16 AM
To: solr-user@lucene.apache.org
Subject: RE: Re: Terminology question: Core vs. Collection vs...

Good write up.

And what about "node"?

I think there needs to be an official glossary of terms that is sanctioned 
by the solr team and some terms still ni use may need to be labeled 
"deprecated". After so many years, its still confusing.

<br><br><br>------- Original Message -------
On 1/3/2013  08:07 AM Jack Krupansky wrote:<br>Collection is the more modern 
term and incorporates the fact that the
<br>collection may be sharded, with each shard on one or more cores, with 
each
<br>core being a replica of the other cores within that shard of that
<br>collection.
<br>
<br>Instance is a general term, but is commonly used to refer to a running 
Solr
<br>server, each of which can service any number of cores. A sharded 
collection
<br>would typically require multiple instances of Solr, each with a shard of 
the
<br>collection.
<br>
<br>Multiple collections can be supported on a single instance of Solr. They
<br>don't have to be sharded or replicated. But if they are, each Solr 
instance
<br>will have a copy or replica of the data (index) of one shard of each 
sharded
<br>collection - to the degree that each collection needs that many shards.
<br>
<br>At the API level, you talk to a Solr instance, using a host and port, 
and
<br>giving the collection name. Some operations will refer only to the 
portion
<br>of a multi-shard collection on that Solr instance, but typically Solr 
will
<br>"distribute" the operation, whether it be an update or a query, to all 
of
<br>the shards of the named collection. In the case of update, the update 
will
<br>be distributed to all replicas as well, but in the case of query only 
one
<br>replica of each shard of the collection is needed.
<br>
<br>Before SolrCloud we Solr had master and slave and the slaves were 
replicas
<br>of the master, but with SolrCloud there is no master and all the 
replicas of
<br>the shard are peers, although at any moment of time one of them will be
<br>considered the "leader" for coordination purposes, but not in the sense 
that
<br>it is a master of the other replicas in that shard. A SolrCloud replica 
is a
<br>replica of the data, in an abstract sense, for a single shard of a
<br>collection. A SolrCloud replica is more of an instance of the 
data/index.
<br>
<br>An index exists at two levels: the portion of a collection on a single 
Solr
<br>core will have a Lucene index, but collectively the Lucene indexes for 
the
<br>shards of a collection can be referred to the index of the collection. 
Each
<br>replica is a copy or instance of a portion of the collection's index.
<br>
<br>The term slice is sometimes used to refer collectively to all of the
<br>cores/replicas of a single shard, or sometimes to a single replica as it
<br>contains only a "slice" of the full collection data.
<br>
<br>-- Jack Krupansky
<br>
<br>-----Original Message----- 
<br>From: Alexandre Rafalovitch
<br>Sent: Thursday, January 03, 2013 4:42 AM
<br>To: solr-user@lucene.apache.org
<br>Subject: Terminology question: Core vs. Collection vs...
<br>
<br>Hello,
<br>
<br>I am trying to understand the core Solr terminology. I am looking for
<br>correct rather than loose meaning as I am trying to teach an example 
that
<br>starts from easy scenario and may scale to multi-core, multi-machine
<br>situation.
<br>
<br>Here are the terms that seem to be all overlapping and/or crossing over 
in
<br>my mind a the moment.
<br>
<br>1) Index
<br>2) Core
<br>3) Collection
<br>4) Instance
<br>5) Replica (Replica of _what_?)
<br>6) Others?
<br>
<br>I tried looking through documentation, but either there is a terminology
<br>drift or I am having trouble understanding the distinctions.
<br>
<br>If anybody has a clear picture in their mind, I would appreciate a
<br>clarification.
<br>
<br>Regards,
<br>   Alex.
<br>
<br>Personal blog: http://blog.outerthoughts.com/
<br>LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
<br>- Time is the quality of nature that keeps events from happening all at
<br>once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD 
book)
<br>
<br> 


Mime
View raw message