lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Modassar Ather <modather1...@gmail.com>
Subject Re: Index optimize runs in background.
Date Tue, 02 Jun 2015 12:35:37 GMT
Erick! I could not find any underlying setting of 10 minutes.
It is not only optimize but commit is also behaving in the same fashion and
is taking lesser time than usually had taken.
As per my observation both are running in background.

On Fri, May 29, 2015 at 7:21 PM, Erick Erickson <erickerickson@gmail.com>
wrote:

> I'm not talking about you setting a timeout, but the underlying
> connection timing out...
>
> The "10 minutes then the indexer exits" comment points in that direction.
>
> Best,
> Erick
>
> On Thu, May 28, 2015 at 11:43 PM, Modassar Ather <modather1981@gmail.com>
> wrote:
> > I have not added any timeout in the indexer except zk client time out
> which
> > is 30 seconds. I am simply calling client.close() at the end of indexing.
> > The same code was not running in background for optimize with solr-4.10.3
> > and org.apache.solr.client.solrj.impl.CloudSolrServer.
> >
> > On Fri, May 29, 2015 at 11:13 AM, Erick Erickson <
> erickerickson@gmail.com>
> > wrote:
> >
> >> Are you timing out on the client request? The theory here is that it's
> >> still a synchronous call, but you're just timing out at the client
> >> level. At that point, the optimize is still running it's just the
> >> connection has been dropped....
> >>
> >> Shot in the dark.
> >> Erick
> >>
> >> On Thu, May 28, 2015 at 10:31 PM, Modassar Ather <
> modather1981@gmail.com>
> >> wrote:
> >> > I could not notice it but with my past experience of commit which
> used to
> >> > take around 2 minutes is now taking around 8 seconds. I think this is
> >> also
> >> > running as background.
> >> >
> >> > On Fri, May 29, 2015 at 10:52 AM, Modassar Ather <
> modather1981@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> The indexer takes almost 2 hours to optimize. It has a multi-threaded
> >> add
> >> >> of batches of documents to
> >> >> org.apache.solr.client.solrj.impl.CloudSolrClient.
> >> >> Once all the documents are indexed it invokes commit and optimize.
I
> >> have
> >> >> seen that the optimize goes into background after 10 minutes and
> indexer
> >> >> exits.
> >> >> I am not sure why this 10 minutes it hangs on indexer. This behavior
> I
> >> >> have seen in multiple iteration of the indexing of same data.
> >> >>
> >> >> There is nothing significant I found in log which I can share. I can
> see
> >> >> following in log.
> >> >> org.apache.solr.update.DirectUpdateHandler2; start
> >> >>
> >>
> commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
> >> >>
> >> >> On Wed, May 27, 2015 at 10:59 PM, Erick Erickson <
> >> erickerickson@gmail.com>
> >> >> wrote:
> >> >>
> >> >>> All strange of course. What do your Solr logs show when this
> happens?
> >> >>> And how reproducible is this?
> >> >>>
> >> >>> Best,
> >> >>> Erick
> >> >>>
> >> >>> On Wed, May 27, 2015 at 4:00 AM, Upayavira <uv@odoko.co.uk>
wrote:
> >> >>> > In this case, optimising makes sense, once the index is generated,
> >> you
> >> >>> > are not updating It.
> >> >>> >
> >> >>> > Upayavira
> >> >>> >
> >> >>> > On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
> >> >>> >> Our index has almost 100M documents running on SolrCloud
of 5
> shards
> >> >>> and
> >> >>> >> each shard has an index size of about 170+GB (for the
record, we
> are
> >> >>> not
> >> >>> >> using stored fields - our documents are pretty large).
We
> perform a
> >> >>> full
> >> >>> >> indexing every weekend and during the week there are no
updates
> >> made to
> >> >>> >> the
> >> >>> >> index. Most of the queries that we run are pretty complex
with
> >> hundreds
> >> >>> >> of
> >> >>> >> terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards,
> boosts
> >> >>> etc.
> >> >>> >> and take many minutes to execute. A difference of 10-20%
is also
> a
> >> big
> >> >>> >> advantage for us.
> >> >>> >>
> >> >>> >> We have been optimizing the index after indexing for years
and it
> >> has
> >> >>> >> worked well for us. Every once in a while, we upgrade
Solr to the
> >> >>> latest
> >> >>> >> version and try without optimizing so that we can save
the many
> >> hours
> >> >>> it
> >> >>> >> take to optimize such a huge index, but find optimized
index work
> >> well
> >> >>> >> for
> >> >>> >> us.
> >> >>> >>
> >> >>> >> Erick I was indexing today the documents and saw the optimize
> >> happening
> >> >>> >> in
> >> >>> >> background.
> >> >>> >>
> >> >>> >> On Tue, May 26, 2015 at 9:12 PM, Erick Erickson <
> >> >>> erickerickson@gmail.com>
> >> >>> >> wrote:
> >> >>> >>
> >> >>> >> > No results yet. I finished the test harness last
night (not
> >> really a
> >> >>> >> > unit test, a stand-alone program that endlessly adds
stuff and
> >> tests
> >> >>> >> > that every commit returns the correct number of docs).
> >> >>> >> >
> >> >>> >> > 8,000 cycles later there aren't any problems reported.
> >> >>> >> >
> >> >>> >> > Siiigggggh.
> >> >>> >> >
> >> >>> >> >
> >> >>> >> > On Tue, May 26, 2015 at 1:51 AM, Modassar Ather <
> >> >>> modather1981@gmail.com>
> >> >>> >> > wrote:
> >> >>> >> > > Hi,
> >> >>> >> > >
> >> >>> >> > > Erick you mentioned about a unit test to test
the optimize
> >> running
> >> >>> in
> >> >>> >> > > background. Kindly share your findings if any.
> >> >>> >> > >
> >> >>> >> > > Thanks,
> >> >>> >> > > Modassar
> >> >>> >> > >
> >> >>> >> > > On Mon, May 25, 2015 at 11:47 AM, Modassar Ather
<
> >> >>> modather1981@gmail.com
> >> >>> >> > >
> >> >>> >> > > wrote:
> >> >>> >> > >
> >> >>> >> > >> Thanks everybody for your replies.
> >> >>> >> > >>
> >> >>> >> > >> I have noticed the optimization running
in background every
> >> time I
> >> >>> >> > >> indexed. This is 5 node cluster with solr-5.1.0
and uses the
> >> >>> >> > >> CloudSolrClient. Kindly share your findings
on this issue.
> >> >>> >> > >>
> >> >>> >> > >> Our index has almost 100M documents running
on SolrCloud. We
> >> have
> >> >>> been
> >> >>> >> > >> optimizing the index after indexing for
years and it has
> worked
> >> >>> well for
> >> >>> >> > >> us.
> >> >>> >> > >>
> >> >>> >> > >> Thanks,
> >> >>> >> > >> Modassar
> >> >>> >> > >>
> >> >>> >> > >> On Fri, May 22, 2015 at 11:55 PM, Erick
Erickson <
> >> >>> >> > erickerickson@gmail.com>
> >> >>> >> > >> wrote:
> >> >>> >> > >>
> >> >>> >> > >>> Actually, I've recently seen very similar
behavior in Solr
> >> >>> 4.10.3, but
> >> >>> >> > >>> involving hard commits openSearcher=true,
see:
> >> >>> >> > >>> https://issues.apache.org/jira/browse/SOLR-7572.
Of
> course I
> >> >>> can't
> >> >>> >> > >>> reproduce this at will, siigggghhhh.
> >> >>> >> > >>>
> >> >>> >> > >>> A unit test should be very simple to
write though, maybe I
> can
> >> >>> get to
> >> >>> >> > it
> >> >>> >> > >>> today.
> >> >>> >> > >>>
> >> >>> >> > >>> Erick
> >> >>> >> > >>>
> >> >>> >> > >>>
> >> >>> >> > >>>
> >> >>> >> > >>> On Fri, May 22, 2015 at 8:27 AM, Upayavira
<uv@odoko.co.uk
> >
> >> >>> wrote:
> >> >>> >> > >>> >
> >> >>> >> > >>> >
> >> >>> >> > >>> > On Fri, May 22, 2015, at 03:55
PM, Shawn Heisey wrote:
> >> >>> >> > >>> >> On 5/21/2015 6:21 AM, Modassar
Ather wrote:
> >> >>> >> > >>> >> > I am using Solr-5.1.0.
I have an indexer class which
> >> invokes
> >> >>> >> > >>> >> > cloudSolrClient.optimize(true,
true, 1). My indexer
> exits
> >> >>> after
> >> >>> >> > the
> >> >>> >> > >>> >> > invocation of optimize
and the optimization keeps on
> >> >>> running in
> >> >>> >> > the
> >> >>> >> > >>> >> > background.
> >> >>> >> > >>> >> > Kindly let me know if
it is per design and how can I
> >> make my
> >> >>> >> > indexer
> >> >>> >> > >>> to
> >> >>> >> > >>> >> > wait until the optimization
is over. Is there a
> >> >>> >> > >>> configuration/parameter I
> >> >>> >> > >>> >> > need to set for the same.
> >> >>> >> > >>> >> >
> >> >>> >> > >>> >> > Please note that the same
indexer with
> >> >>> >> > >>> cloudSolrServer.optimize(true, true,
> >> >>> >> > >>> >> > 1) on Solr-4.10 used to
wait till the optimize was
> over
> >> >>> before
> >> >>> >> > >>> exiting.
> >> >>> >> > >>> >>
> >> >>> >> > >>> >> This is very odd, because I
could not get
> HttpSolrServer to
> >> >>> >> > optimize in
> >> >>> >> > >>> >> the background, even when that
was what I wanted.
> >> >>> >> > >>> >>
> >> >>> >> > >>> >> I wondered if maybe the Cloud
object behaves differently
> >> with
> >> >>> >> > regard to
> >> >>> >> > >>> >> blocking until an optimize
is finished ... except that
> >> there
> >> >>> is no
> >> >>> >> > code
> >> >>> >> > >>> >> for optimizing in CloudSolrClient
at all ... so I don't
> >> know
> >> >>> where
> >> >>> >> > the
> >> >>> >> > >>> >> different behavior would actually
be happening.
> >> >>> >> > >>> >
> >> >>> >> > >>> > A more important question is, why
are you optimising?
> >> >>> Generally it
> >> >>> >> > isn't
> >> >>> >> > >>> > recommended anymore as it reduces
the natural
> distribution
> >> of
> >> >>> >> > documents
> >> >>> >> > >>> > amongst segments and makes future
merges more costly.
> >> >>> >> > >>> >
> >> >>> >> > >>> > Upayavira
> >> >>> >> > >>>
> >> >>> >> > >>
> >> >>> >> > >>
> >> >>> >> >
> >> >>>
> >> >>
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message