lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Modassar Ather <modather1...@gmail.com>
Subject Re: Index optimize runs in background.
Date Thu, 11 Jun 2015 05:14:59 GMT
Hi,

There are 5 cores and a separate server for indexing on this solrcloud. Can
you please share your suggestions on:
  How can indexer know that the optimize has completed even if the
commit/optimize runs in background without going to the solr servers may be
by using any solrj or other API?

I tried but could not find any API/handler to check if the optimizations is
completed. Kindly share your inputs.

Thanks,
Modassar

On Thu, Jun 4, 2015 at 9:36 PM, Erick Erickson <erickerickson@gmail.com>
wrote:

> Can't get any failures to happen on my end so I really haven't a clue.
>
> Best,
> Erick
>
> On Thu, Jun 4, 2015 at 3:17 AM, Modassar Ather <modather1981@gmail.com>
> wrote:
> > Hi,
> >
> > Please provide your inputs on optimize and commit running as background.
> > Your suggestion will be really helpful.
> >
> > Thanks,
> > Modassar
> >
> > On Tue, Jun 2, 2015 at 6:05 PM, Modassar Ather <modather1981@gmail.com>
> > wrote:
> >
> >> Erick! I could not find any underlying setting of 10 minutes.
> >> It is not only optimize but commit is also behaving in the same fashion
> >> and is taking lesser time than usually had taken.
> >> As per my observation both are running in background.
> >>
> >> On Fri, May 29, 2015 at 7:21 PM, Erick Erickson <
> erickerickson@gmail.com>
> >> wrote:
> >>
> >>> I'm not talking about you setting a timeout, but the underlying
> >>> connection timing out...
> >>>
> >>> The "10 minutes then the indexer exits" comment points in that
> direction.
> >>>
> >>> Best,
> >>> Erick
> >>>
> >>> On Thu, May 28, 2015 at 11:43 PM, Modassar Ather <
> modather1981@gmail.com>
> >>> wrote:
> >>> > I have not added any timeout in the indexer except zk client time out
> >>> which
> >>> > is 30 seconds. I am simply calling client.close() at the end of
> >>> indexing.
> >>> > The same code was not running in background for optimize with
> >>> solr-4.10.3
> >>> > and org.apache.solr.client.solrj.impl.CloudSolrServer.
> >>> >
> >>> > On Fri, May 29, 2015 at 11:13 AM, Erick Erickson <
> >>> erickerickson@gmail.com>
> >>> > wrote:
> >>> >
> >>> >> Are you timing out on the client request? The theory here is that
> it's
> >>> >> still a synchronous call, but you're just timing out at the client
> >>> >> level. At that point, the optimize is still running it's just the
> >>> >> connection has been dropped....
> >>> >>
> >>> >> Shot in the dark.
> >>> >> Erick
> >>> >>
> >>> >> On Thu, May 28, 2015 at 10:31 PM, Modassar Ather <
> >>> modather1981@gmail.com>
> >>> >> wrote:
> >>> >> > I could not notice it but with my past experience of commit
which
> >>> used to
> >>> >> > take around 2 minutes is now taking around 8 seconds. I think
> this is
> >>> >> also
> >>> >> > running as background.
> >>> >> >
> >>> >> > On Fri, May 29, 2015 at 10:52 AM, Modassar Ather <
> >>> modather1981@gmail.com
> >>> >> >
> >>> >> > wrote:
> >>> >> >
> >>> >> >> The indexer takes almost 2 hours to optimize. It has a
> >>> multi-threaded
> >>> >> add
> >>> >> >> of batches of documents to
> >>> >> >> org.apache.solr.client.solrj.impl.CloudSolrClient.
> >>> >> >> Once all the documents are indexed it invokes commit and
> optimize. I
> >>> >> have
> >>> >> >> seen that the optimize goes into background after 10 minutes
and
> >>> indexer
> >>> >> >> exits.
> >>> >> >> I am not sure why this 10 minutes it hangs on indexer.
This
> >>> behavior I
> >>> >> >> have seen in multiple iteration of the indexing of same
data.
> >>> >> >>
> >>> >> >> There is nothing significant I found in log which I can
share. I
> >>> can see
> >>> >> >> following in log.
> >>> >> >> org.apache.solr.update.DirectUpdateHandler2; start
> >>> >> >>
> >>> >>
> >>>
> commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> >>> >> >>
> >>> >> >> On Wed, May 27, 2015 at 10:59 PM, Erick Erickson <
> >>> >> erickerickson@gmail.com>
> >>> >> >> wrote:
> >>> >> >>
> >>> >> >>> All strange of course. What do your Solr logs show
when this
> >>> happens?
> >>> >> >>> And how reproducible is this?
> >>> >> >>>
> >>> >> >>> Best,
> >>> >> >>> Erick
> >>> >> >>>
> >>> >> >>> On Wed, May 27, 2015 at 4:00 AM, Upayavira <uv@odoko.co.uk>
> wrote:
> >>> >> >>> > In this case, optimising makes sense, once the
index is
> >>> generated,
> >>> >> you
> >>> >> >>> > are not updating It.
> >>> >> >>> >
> >>> >> >>> > Upayavira
> >>> >> >>> >
> >>> >> >>> > On Wed, May 27, 2015, at 06:14 AM, Modassar Ather
wrote:
> >>> >> >>> >> Our index has almost 100M documents running
on SolrCloud of 5
> >>> shards
> >>> >> >>> and
> >>> >> >>> >> each shard has an index size of about 170+GB
(for the record,
> >>> we are
> >>> >> >>> not
> >>> >> >>> >> using stored fields - our documents are pretty
large). We
> >>> perform a
> >>> >> >>> full
> >>> >> >>> >> indexing every weekend and during the week
there are no
> updates
> >>> >> made to
> >>> >> >>> >> the
> >>> >> >>> >> index. Most of the queries that we run are
pretty complex
> with
> >>> >> hundreds
> >>> >> >>> >> of
> >>> >> >>> >> terms using PhraseQuery, BooleanQuery, SpanQuery,
Wildcards,
> >>> boosts
> >>> >> >>> etc.
> >>> >> >>> >> and take many minutes to execute. A difference
of 10-20% is
> >>> also a
> >>> >> big
> >>> >> >>> >> advantage for us.
> >>> >> >>> >>
> >>> >> >>> >> We have been optimizing the index after indexing
for years
> and
> >>> it
> >>> >> has
> >>> >> >>> >> worked well for us. Every once in a while,
we upgrade Solr to
> >>> the
> >>> >> >>> latest
> >>> >> >>> >> version and try without optimizing so that
we can save the
> many
> >>> >> hours
> >>> >> >>> it
> >>> >> >>> >> take to optimize such a huge index, but find
optimized index
> >>> work
> >>> >> well
> >>> >> >>> >> for
> >>> >> >>> >> us.
> >>> >> >>> >>
> >>> >> >>> >> Erick I was indexing today the documents
and saw the optimize
> >>> >> happening
> >>> >> >>> >> in
> >>> >> >>> >> background.
> >>> >> >>> >>
> >>> >> >>> >> On Tue, May 26, 2015 at 9:12 PM, Erick Erickson
<
> >>> >> >>> erickerickson@gmail.com>
> >>> >> >>> >> wrote:
> >>> >> >>> >>
> >>> >> >>> >> > No results yet. I finished the test
harness last night (not
> >>> >> really a
> >>> >> >>> >> > unit test, a stand-alone program that
endlessly adds stuff
> and
> >>> >> tests
> >>> >> >>> >> > that every commit returns the correct
number of docs).
> >>> >> >>> >> >
> >>> >> >>> >> > 8,000 cycles later there aren't any
problems reported.
> >>> >> >>> >> >
> >>> >> >>> >> > Siiigggggh.
> >>> >> >>> >> >
> >>> >> >>> >> >
> >>> >> >>> >> > On Tue, May 26, 2015 at 1:51 AM, Modassar
Ather <
> >>> >> >>> modather1981@gmail.com>
> >>> >> >>> >> > wrote:
> >>> >> >>> >> > > Hi,
> >>> >> >>> >> > >
> >>> >> >>> >> > > Erick you mentioned about a unit
test to test the
> optimize
> >>> >> running
> >>> >> >>> in
> >>> >> >>> >> > > background. Kindly share your findings
if any.
> >>> >> >>> >> > >
> >>> >> >>> >> > > Thanks,
> >>> >> >>> >> > > Modassar
> >>> >> >>> >> > >
> >>> >> >>> >> > > On Mon, May 25, 2015 at 11:47 AM,
Modassar Ather <
> >>> >> >>> modather1981@gmail.com
> >>> >> >>> >> > >
> >>> >> >>> >> > > wrote:
> >>> >> >>> >> > >
> >>> >> >>> >> > >> Thanks everybody for your replies.
> >>> >> >>> >> > >>
> >>> >> >>> >> > >> I have noticed the optimization
running in background
> every
> >>> >> time I
> >>> >> >>> >> > >> indexed. This is 5 node cluster
with solr-5.1.0 and uses
> >>> the
> >>> >> >>> >> > >> CloudSolrClient. Kindly share
your findings on this
> issue.
> >>> >> >>> >> > >>
> >>> >> >>> >> > >> Our index has almost 100M documents
running on
> SolrCloud.
> >>> We
> >>> >> have
> >>> >> >>> been
> >>> >> >>> >> > >> optimizing the index after
indexing for years and it has
> >>> worked
> >>> >> >>> well for
> >>> >> >>> >> > >> us.
> >>> >> >>> >> > >>
> >>> >> >>> >> > >> Thanks,
> >>> >> >>> >> > >> Modassar
> >>> >> >>> >> > >>
> >>> >> >>> >> > >> On Fri, May 22, 2015 at 11:55
PM, Erick Erickson <
> >>> >> >>> >> > erickerickson@gmail.com>
> >>> >> >>> >> > >> wrote:
> >>> >> >>> >> > >>
> >>> >> >>> >> > >>> Actually, I've recently
seen very similar behavior in
> Solr
> >>> >> >>> 4.10.3, but
> >>> >> >>> >> > >>> involving hard commits
openSearcher=true, see:
> >>> >> >>> >> > >>> https://issues.apache.org/jira/browse/SOLR-7572.
Of
> >>> course I
> >>> >> >>> can't
> >>> >> >>> >> > >>> reproduce this at will,
siigggghhhh.
> >>> >> >>> >> > >>>
> >>> >> >>> >> > >>> A unit test should be very
simple to write though,
> maybe
> >>> I can
> >>> >> >>> get to
> >>> >> >>> >> > it
> >>> >> >>> >> > >>> today.
> >>> >> >>> >> > >>>
> >>> >> >>> >> > >>> Erick
> >>> >> >>> >> > >>>
> >>> >> >>> >> > >>>
> >>> >> >>> >> > >>>
> >>> >> >>> >> > >>> On Fri, May 22, 2015 at
8:27 AM, Upayavira <
> >>> uv@odoko.co.uk>
> >>> >> >>> wrote:
> >>> >> >>> >> > >>> >
> >>> >> >>> >> > >>> >
> >>> >> >>> >> > >>> > On Fri, May 22, 2015,
at 03:55 PM, Shawn Heisey
> wrote:
> >>> >> >>> >> > >>> >> On 5/21/2015 6:21
AM, Modassar Ather wrote:
> >>> >> >>> >> > >>> >> > I am using
Solr-5.1.0. I have an indexer class
> which
> >>> >> invokes
> >>> >> >>> >> > >>> >> > cloudSolrClient.optimize(true,
true, 1). My
> indexer
> >>> exits
> >>> >> >>> after
> >>> >> >>> >> > the
> >>> >> >>> >> > >>> >> > invocation
of optimize and the optimization keeps
> on
> >>> >> >>> running in
> >>> >> >>> >> > the
> >>> >> >>> >> > >>> >> > background.
> >>> >> >>> >> > >>> >> > Kindly let
me know if it is per design and how
> can I
> >>> >> make my
> >>> >> >>> >> > indexer
> >>> >> >>> >> > >>> to
> >>> >> >>> >> > >>> >> > wait until
the optimization is over. Is there a
> >>> >> >>> >> > >>> configuration/parameter
I
> >>> >> >>> >> > >>> >> > need to set
for the same.
> >>> >> >>> >> > >>> >> >
> >>> >> >>> >> > >>> >> > Please note
that the same indexer with
> >>> >> >>> >> > >>> cloudSolrServer.optimize(true,
true,
> >>> >> >>> >> > >>> >> > 1) on Solr-4.10
used to wait till the optimize was
> >>> over
> >>> >> >>> before
> >>> >> >>> >> > >>> exiting.
> >>> >> >>> >> > >>> >>
> >>> >> >>> >> > >>> >> This is very odd,
because I could not get
> >>> HttpSolrServer to
> >>> >> >>> >> > optimize in
> >>> >> >>> >> > >>> >> the background,
even when that was what I wanted.
> >>> >> >>> >> > >>> >>
> >>> >> >>> >> > >>> >> I wondered if
maybe the Cloud object behaves
> >>> differently
> >>> >> with
> >>> >> >>> >> > regard to
> >>> >> >>> >> > >>> >> blocking until
an optimize is finished ... except
> that
> >>> >> there
> >>> >> >>> is no
> >>> >> >>> >> > code
> >>> >> >>> >> > >>> >> for optimizing
in CloudSolrClient at all ... so I
> don't
> >>> >> know
> >>> >> >>> where
> >>> >> >>> >> > the
> >>> >> >>> >> > >>> >> different behavior
would actually be happening.
> >>> >> >>> >> > >>> >
> >>> >> >>> >> > >>> > A more important question
is, why are you optimising?
> >>> >> >>> Generally it
> >>> >> >>> >> > isn't
> >>> >> >>> >> > >>> > recommended anymore
as it reduces the natural
> >>> distribution
> >>> >> of
> >>> >> >>> >> > documents
> >>> >> >>> >> > >>> > amongst segments and
makes future merges more costly.
> >>> >> >>> >> > >>> >
> >>> >> >>> >> > >>> > Upayavira
> >>> >> >>> >> > >>>
> >>> >> >>> >> > >>
> >>> >> >>> >> > >>
> >>> >> >>> >> >
> >>> >> >>>
> >>> >> >>
> >>> >> >>
> >>> >>
> >>>
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message