lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Shared Directory for two Solr Clouds(Writer and Reader)
Date Tue, 21 Oct 2014 14:30:01 GMT
Hmmm, I sure hope you have _lots_ of shards. At that rate, a single
shard is probably going to run up against internal limits in a _very_
short time (the most docs I've seen successfully served on a single
shard run around 300M).

It seems, to handle any reasonable retention period, you need lots and
lots and lots of physical machines out there. Which hints at using
regular SolrCloud since each machine would then be handling much less
of the load.

This is what I mean by "the XY problem". Your setup, at least from
what you've told us so far, has so many unknowns that it's impossible
to say much. If you go with your original e-mail and get it all set up
and running on, say, 3 shards, it would work fine for about an hour.
At that point you would have 300M docs on each shard and your query
performance would start having... problems. You'd be hitting the hard
limit of 2B docs/shard in less than 10 hours. And all the work you've
put into this complex coordination setup would be totally wasted.

So, you _really_ have to explain a lot more about the problem before
we talk about writing code. You might want to review:
http://wiki.apache.org/solr/UsingMailingLists

Best,
Erick

On Tue, Oct 21, 2014 at 12:34 AM, Jaeyoung Yoon <jaeyoungyoon@gmail.com> wrote:
> In my case, injest rate is very high(above 300K docs/sec) and data are kept
> inserted. So CPU is already bottleneck because of indexing.
>
> older-style master/slave replication with http or scp takes long to copy
> big files from master/slave.
>
> That's why I setup two separate Solr Clouds. One for indexing and the other
> for query.
>
> Thanks,
> Jae
>
> On Mon, Oct 20, 2014 at 6:22 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
>> I guess I'm not quite sure what the point is. So can you back up a bit
>> and explain what problem this is trying to solve? Because all it
>> really appears to be doing that's not already done with stock Solr
>> is saving some disk space, and perhaps your "reader" SolrCloud
>> is having some more cycles to devote to serving queries rather
>> than indexing.
>>
>> So I'm curious why
>> 1> standard SolrCloud with selective hard and soft commits doesn't
>> satisfy the need
>> and
>> 2> If <1> is not reasonable, why older-style master/slave replication
>> doesn't work.
>>
>> Unless there's a compelling use-case for this, it seems like there's
>> a lot of complexity here for questionable value.
>>
>> Please note I'm not saying this is a bad idea. It would just be good
>> to  understand what problem it's trying to solve. I'm reluctant to
>> introduce complexity without discussing the use-case. Perhaps
>> the existing code could provide a "good enough" solution.
>>
>> Best,
>> Erick
>>
>> On Mon, Oct 20, 2014 at 7:35 PM, Jaeyoung Yoon <jaeyoungyoon@gmail.com>
>> wrote:
>> > Hi Folks,
>> >
>> > Here are some my ideas to use shared file system with two separate Solr
>> > Clouds(Writer Solr Cloud and Reader Solr Cloud).
>> >
>> > I want to get your valuable feedbacks
>> >
>> > For prototype, I setup two separate Solr Clouds(one for Writer and the
>> > other for Reader).
>> >
>> > Basically big picture of my prototype is like below.
>> >
>> > 1. Reader and Writer Solr clouds share the same directory
>> > 2. Writer SolrCloud sends the "openSearcher" commands to Reader Solr
>> Cloud
>> > inside postCommit eventHandler. That is, when new data are added to
>> Writer
>> > Solr Cloud, writer Solr Cloud sends own openSearcher command to Reader
>> Solr
>> > Cloud.
>> > 3. Reader opens "searcher" only when it receives "openSearcher" commands
>> > from Writer SolrCloud
>> > 4. Writer has own deletionPolicy to keep old commit points which might be
>> > used by running queries on Reader Solr Cloud when new searcher is opened
>> on
>> > reader SolrCloud.
>> > 5. Reader has no update/no commits. Everything on reader Solr Cloud are
>> > read-only. It also creates searcher from directory not from
>> > indexer(nrtMode=false).
>> >
>> > That is,
>> > In Writer Solr Cloud, I added postCommit eventListner. Inside the
>> > postCommit eventListner, it sends own "openSearcher" command to reader
>> Solr
>> > Cloud's own handler. Then reader Solr Cloud will create openSearcher
>> > directly without commit and return the writer's request.
>> >
>> > With this approach, Writer and Reader can use the same commit points in
>> > shared file system in synchronous way.
>> > When a Reader SolrCloud starts, it doesn't create openSearcher. Instead.
>> > Writer Solr Cloud listens the zookeeper of Reader Solr Cloud. Any change
>> in
>> > the reader SolrCloud, writer sends "openSearcher" command to reader Solr
>> > Cloud.
>> >
>> > Does it make sense? Or am I missing some important stuff?
>> >
>> > any feedback would be very helpful to me.
>> >
>> > Thanks,
>> > Jae
>>

Mime
View raw message