lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Separating Search and Indexing in SolrCloud
Date Sun, 18 Dec 2016 15:53:39 GMT
Analyzed documents. The transaction log stores the raw input.

On Sun, Dec 18, 2016 at 5:32 AM, Jaroslaw Rozanski <me@jarekrozanski.com> wrote:
> Hi Erick,
>
>
> Not talking about separation any more. I merely summarized message from
> Pushkar. As I said it was clear that it was not possible.
>
>
> About the RAMBufferSizeMB, getting back to my original question, is this
> buffer for storing update requests or ready to index, analyzed documents?
>
> Documentation suggests former, your first mention however suggests the
> later.
>
>
> Thanks,
> Jaroslaw
>
>
> On 18/12/16 02:16, Erick Erickson wrote:
>> Yes indexing is adding stress. No you can't separate
>> the two in SolrCloud. End of story, why beat it to death?
>> You'll have to figure out the sharding strategy that
>> meets your indexing and querying needs and live
>> within that framework. I'd advise setting up a small
>> cluster and driving it to its tipping point and extrapolating
>> from there. Here's the long version of "the sizing exercise".
>>
>> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>>
>> My point that while indexing to Solr/Lucene there is
>> additional pressure. That pressure has a fixed upper
>> limit that doesn't grow with the number of docs. That's not
>> true for searching, as you add more docs per node, the
>> pressure (especially memory) increases. Concentrate
>> your efforts there IMO.
>>
>> Best
>> Erick
>>
>>
>>
>> On Sat, Dec 17, 2016 at 12:54 PM, Jaroslaw Rozanski
>> <me@jarekrozanski.com> wrote:
>>> Hi Erick,
>>>
>>> So what does this buffer represent? What does it actually store? Raw
>>> update request or analyzed document?
>>>
>>> The documentation suggest that it stores actual update requests.
>>>
>>> Obviously analyzed document can and will occupy much more space than raw
>>> one. Also analysis with create a lot of new allocations and subsequent
>>> GC work.
>>>
>>> Yes, you are probably right that search puts more stress and is main
>>> memory user but combination of:
>>> - non-trivial analysis,
>>> - high volume of updates and
>>> - search on the same node
>>>
>>> seems adding fuel to the fire.
>>>
>>> From previous response by Pushkar, it is clear that separation is not
>>> achievable with existing SolrCloud mechanism.
>>>
>>> Thanks
>>>
>>>
>>> On 17/12/16 20:24, Erick Erickson wrote:
>>>> bq: I am more concerned with indexing memory requirements at volume
>>>>
>>>> By and large this isn't much of a problem. RAMBufferSizeMB in
>>>> solrconfig.xml governs how much memory is consumed in Solr for
>>>> indexing. When that limit is exceeded, the buffer is flushed to disk.
>>>> I've rarely heard of indexing being a memory issue. Anecdotally I
>>>> haven't seen throughput benefit with buffer sizes over 128M.
>>>>
>>>> You're correct in that master/slave style replication would use less
>>>> memory on the slave, although there are other costs. I.e. rather than
>>>> the data for document X being sent to the replicas once as in
>>>> SolrCloud, that data is re-sent to the slave every time it's merged
>>>> into a new segment.
>>>>
>>>> That said, memory issues are _far_ more prevalent on the search side
>>>> of things so unless this is a proven issue in your environment I would
>>>> fight other fires.....
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Fri, Dec 16, 2016 at 1:06 PM, Jaroslaw Rozanski <me@jarekrozanski.com>
wrote:
>>>>> Thanks, that issue looks interesting!
>>>>>
>>>>> On 16/12/16 16:38, Pushkar Raste wrote:
>>>>>> This kind of separation is not supported yet.  There however some
work
>>>>>> going on,  you can read about it on
>>>>>> https://issues.apache.org/jira/browse/SOLR-9835
>>>>>>
>>>>>> This unfortunately would not support soft commits and hence would
not be a
>>>>>> good solution for near real time indexing.
>>>>>>
>>>>>> On Dec 16, 2016 7:44 AM, "Jaroslaw Rozanski" <me@jarekrozanski.com>
wrote:
>>>>>>
>>>>>>> Sorry, not what I meant.
>>>>>>>
>>>>>>> Leader is responsible for distributing update requests to replica.
So
>>>>>>> eventually all replicas have same state as leader. Not a problem.
>>>>>>>
>>>>>>> It is more about the performance of such. If I gather correctly
normal
>>>>>>> replication happens by standard update request. Not by, say,
segment copy.
>>>>>>>
>>>>>>> Which means update on leader is as "expensive" as on replica.
>>>>>>>
>>>>>>> Hence, if my understanding is correct, sending search request
to replica
>>>>>>> only, in index heavy environment, would bring no benefit.
>>>>>>>
>>>>>>> So the question is: is there a mechanism, in SolrCloud (not legacy
>>>>>>> master/slave set-up) to make one node take a load of indexing
which
>>>>>>> other nodes focus on searching.
>>>>>>>
>>>>>>> This is not a question of SolrClient cause that is clear how
to direct
>>>>>>> search request to specific nodes. This is more about index optimization
>>>>>>> so that certain nodes (ie. replicas) could suffer less due to
high
>>>>>>> volume indexing while serving search requests.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 16/12/16 12:35, Dorian Hoxha wrote:
>>>>>>>> The leader is the source of truth. You expect to make the
replica the
>>>>>>>> source of truth or something???Doesn't make sense?
>>>>>>>> What people do, is send write to leader/master and reads
to
>>>>>>> replicas/slaves
>>>>>>>> in other solr/other-dbs.
>>>>>>>>
>>>>>>>> On Fri, Dec 16, 2016 at 1:31 PM, Jaroslaw Rozanski <me@jarekrozanski.com
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> According to documentation, in normal operation (not
recovery) in Solr
>>>>>>>>> Cloud configuration the leader sends updates it receives
to all the
>>>>>>>>> replicas.
>>>>>>>>>
>>>>>>>>> This means and all nodes in the shard perform same effort
to index
>>>>>>>>> single document. Correct?
>>>>>>>>>
>>>>>>>>> Is there then a benefit to *not* to send search requests
to leader, but
>>>>>>>>> only to replicas?
>>>>>>>>>
>>>>>>>>> Given index & search heavy Solr Cloud system, is
it possible to separate
>>>>>>>>> search from indexing nodes?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> RE: Solr 5.5.0
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jaroslaw Rozanski | e: me@jarekrozanski.com
>>>>>>>>> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jaroslaw Rozanski | e: me@jarekrozanski.com
>>>>>>> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Jaroslaw Rozanski | e: me@jarekrozanski.com
>>>>> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>>>>>
>>>
>>> --
>>> Jaroslaw Rozanski | e: me@jarekrozanski.com
>>> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>>>
>
> --
> Jaroslaw Rozanski | e: me@jarekrozanski.com
> 695E 436F A176 4961 7793  5C70 AFDF FB5E 682C 4D3D
>

Mime
View raw message