lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Calderon <calderon....@gmail.com>
Subject Re: questions about Solr shards
Date Mon, 28 Jun 2010 15:50:36 GMT
there is a first pass query to retrieve all matching document ids from
every shard along with relevant sorting information, the document ids
are then sorted and limited to the amount needed, then a second query
is sent for the rest of the documents metadata.

On Sun, Jun 27, 2010 at 7:32 PM, Babak Farhang <farhang@gmail.com> wrote:
> Otis,
>
> Belated thanks for your reply.
>
>>> 2. "The index could change between stages, e.g. a
>>> document that matched a
>>> query and was subsequently changed may no
>>> longer match but will still be
>>> retrieved."
>
>> 2. This describes the situation where, for instance, a
>> document with ID=10 is updated between the 2 calls
>> to the Solr instance/shard where that doc ID=10 lives.
>
> Can you explain why this happens? (I.e. does each query to the sharded
> index somehow involve 2 calls to each shard instance from the base
> instance?)
>
> -Babak
>
> On Thu, Jun 24, 2010 at 10:14 PM, Otis Gospodnetic
> <otis_gospodnetic@yahoo.com> wrote:
>> Hi Babak,
>>
>> 1. Yes, you are reading that correctly.
>>
>> 2. This describes the situation where, for instance, a document with ID=10 is updated
between the 2 calls to the Solr instance/shard where that doc ID=10 lives.
>>
>> 3. Yup, orthogonal.  You can have a master with multiple cores for sharded and non-sharded
indices and you can have a slave with cores that hold complete indices or just their shards.
>>  Otis
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>>
>> ----- Original Message ----
>>> From: Babak Farhang <farhang@gmail.com>
>>> To: solr-user@lucene.apache.org
>>> Sent: Thu, June 24, 2010 6:32:54 PM
>>> Subject: questions about Solr shards
>>>
>>> Hi everyone,
>>
>> There are a couple of notes on the limitations of this
>>> approach at
>>
>>> target=_blank >http://wiki.apache.org/solr/DistributedSearch which I'm
>>> having trouble
>> understanding.
>>
>> 1. "When duplicate doc IDs are received,
>>> Solr chooses the first doc
>>   and discards subsequent
>>> ones"
>>
>> "Received" here is from the perspective of the base Solr instance
>>> at
>> query time, right?  I.e. if you inadvertently indexed 2 versions
>>> of
>> the document with the same unique ID but different contents to
>>> 2
>> shards, then at query time, the "first" document (putting aside for
>> the
>>> moment what exactly "first" means) would win.  Am I reading
>>> this
>> right?
>>
>>
>> 2. "The index could change between stages, e.g. a
>>> document that matched a
>>   query and was subsequently changed may no
>>> longer match but will still be
>>   retrieved."
>>
>> I have no idea what
>>> this second statement means.
>>
>>
>> And one other question about
>>> shards:
>>
>> 3. The examples I've seen documented do not illustrate
>>> sharded,
>> multicore setups; only sharded monolithic cores.  I assume
>>> sharding
>> works with multicore as well (i.e. the two issues are
>>> orthogonal).  Is
>> this right?
>>
>>
>> Any help on interpreting the
>>> above would be much appreciated.
>>
>> Thank you,
>> -Babak
>>
>

Mime
View raw message