lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Solanki <nitinml...@gmail.com>
Subject Re: Whole RAM consumed while Indexing.
Date Fri, 20 Mar 2015 06:12:39 GMT
On Fri, Mar 20, 2015 at 1:35 AM, Erick Erickson <erickerickson@gmail.com>
wrote:

> That or even hard commit to 60 seconds. It's strictly a matter of how often
> you want to close old segments and open new ones.
>
> On Thu, Mar 19, 2015 at 3:12 AM, Nitin Solanki <nitinmlvya@gmail.com>
> wrote:
> > Hi Erick..
> >               I read your Article. Really nice...
> > Inside that you said that for bulk indexing. Set soft commit = 10 mins
> and
> > hard commit = 15sec. Is it also okay for my scenario?
> >
> > On Thu, Mar 19, 2015 at 1:53 AM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> >> bq: As you said, do commits after 60000 seconds
> >>
> >> No, No, No. I'm NOT saying 60000 seconds! That time is in _milliseconds_
> >> as Shawn said. So setting it to 60000 is every minute.
> >>
> >> From solrconfig.xml, conveniently located immediately above the
> >> <autoCommit> tag:
> >>
> >> maxTime - Maximum amount of time in ms that is allowed to pass since a
> >> document was added before automatically triggering a new commit.
> >>
> >> Also, a lot of answers to soft and hard commits is here as I pointed
> >> out before, did you read it?
> >>
> >>
> >>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >>
> >> Best
> >> Erick
> >>
> >> On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch
> >> <arafalov@gmail.com> wrote:
> >> > Probably merged somewhat differently with some terms indexes repeating
> >> > between segments. Check the number of segments in data directory.And
> >> > do search for *:* and make sure both do have the same document counts.
> >> >
> >> > Also, In all these discussions, you still haven't answered about how
> >> > fast after indexing you want to _search_? Because, if you are not
> >> > actually searching while committing, you could even index on a
> >> > completely separate server (e.g. a faster one) and swap (or alias)
> >> > index in afterwards. Unless, of course, I missed it, it's a lot of
> >> > emails in a very short window of time.
> >> >
> >> > Regards,
> >> >    Alex.
> >> >
> >> > ----
> >> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> >> > http://www.solr-start.com/
> >> >
> >> >
> >> > On 18 March 2015 at 12:09, Nitin Solanki <nitinmlvya@gmail.com>
> wrote:
> >> >> When I kept my configuration to 300 for soft commit and 3000 for hard
> >> >> commit and indexed some amount of data, I got the data size of the
> whole
> >> >> index to be 6GB after completing the indexing.
> >> >>
> >> >> When I changed the configuration to 60000 for soft commit and 60000
> for
> >> >> hard commit and indexed same data then I got the data size of the
> whole
> >> >> index to be 5GB after completing the indexing.
> >> >>
> >> >> But the number of documents in the both scenario were same. I am
> >> wondering
> >> >> how that can be possible?
> >> >>
> >> >> On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki <nitinmlvya@gmail.com
> >
> >> wrote:
> >> >>
> >> >>> Hi Erick,
> >> >>>              I am just saying. I want to be sure on commits
> >> difference..
> >> >>> What if I do frequent commits or not? And why I am saying that
I
> need
> >> to
> >> >>> commit things so very quickly because I have to index 28GB of data
> >> which
> >> >>> takes 7-8 hours(frequent commits).
> >> >>> As you said, do commits after 60000 seconds then it will be more
> >> expensive.
> >> >>> If I don't encounter with **"overlapping searchers" warning
> messages**
> >> >>> then I feel it seems to be okay. Is it?
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <
> >> erickerickson@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>>> Don't do it. Really, why do you want to do this? This seems
like
> >> >>>> an "XY" problem, you haven't explained why you need to commit
> >> >>>> things so very quickly.
> >> >>>>
> >> >>>> I suspect you haven't tried _searching_ while committing at
such
> >> >>>> a rate, and you might as well turn all your top-level caches
off
> >> >>>> in solrconfig.xml since they won't be useful at all.
> >> >>>>
> >> >>>> Best,
> >> >>>> Erick
> >> >>>>
> >> >>>> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <
> nitinmlvya@gmail.com>
> >> >>>> wrote:
> >> >>>> > Hi,
> >> >>>> >        If I do very very fast indexing(softcommit = 300
and
> >> hardcommit =
> >> >>>> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit
=
> 60000)
> >> as
> >> >>>> you
> >> >>>> > both said. Will fast indexing fail to index some data?
> >> >>>> > Any suggestion on this ?
> >> >>>> >
> >> >>>> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar
<
> >> >>>> > andyetitmoves@gmail.com> wrote:
> >> >>>> >
> >> >>>> >> Yes, and doing so is painful and takes lots of people
and
> hardware
> >> >>>> >> resources to get there for large amounts of data and
queries :)
> >> >>>> >>
> >> >>>> >> As Erick says, work backwards from 60s and first establish
how
> >> high the
> >> >>>> >> commit interval can be to satisfy your use case..
> >> >>>> >> On 16 Mar 2015 16:04, "Erick Erickson" <erickerickson@gmail.com
> >
> >> >>>> wrote:
> >> >>>> >>
> >> >>>> >> > First start by lengthening your soft and hard
commit intervals
> >> >>>> >> > substantially. Start with 60000 and work backwards
I'd say.
> >> >>>> >> >
> >> >>>> >> > Ramkumar has tuned the heck out of his installation
to get the
> >> commit
> >> >>>> >> > intervals to be that short ;).
> >> >>>> >> >
> >> >>>> >> > I'm betting that you'll see your RAM usage go
way down, but
> >> that' s a
> >> >>>> >> > guess until you test.
> >> >>>> >> >
> >> >>>> >> > Best,
> >> >>>> >> > Erick
> >> >>>> >> >
> >> >>>> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki
<
> >> >>>> nitinmlvya@gmail.com>
> >> >>>> >> > wrote:
> >> >>>> >> > > Hi Erick,
> >> >>>> >> > >             You are saying correct. Something,
> **"overlapping
> >> >>>> >> searchers"
> >> >>>> >> > > warning messages** are coming in logs.
> >> >>>> >> > > **numDocs numbers** are changing when documents
are adding
> at
> >> the
> >> >>>> time
> >> >>>> >> of
> >> >>>> >> > > indexing.
> >> >>>> >> > > Any help?
> >> >>>> >> > >
> >> >>>> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick
Erickson <
> >> >>>> >> > erickerickson@gmail.com>
> >> >>>> >> > > wrote:
> >> >>>> >> > >
> >> >>>> >> > >> First, the soft commit interval is very
short. Very, very,
> >> very,
> >> >>>> very
> >> >>>> >> > >> short. 300ms is
> >> >>>> >> > >> just short of insane unless it's a typo
;).
> >> >>>> >> > >>
> >> >>>> >> > >> Here's a long background:
> >> >>>> >> > >>
> >> >>>> >> > >>
> >> >>>> >> >
> >> >>>> >>
> >> >>>>
> >>
> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
> >> >>>> >> > >>
> >> >>>> >> > >> But the short form is that you're opening
searchers every
> 300
> >> ms.
> >> >>>> The
> >> >>>> >> > >> hard commit is better,
> >> >>>> >> > >> but every 3 seconds is still far too
short IMO. I'd start
> with
> >> >>>> soft
> >> >>>> >> > >> commits of 60000 and hard
> >> >>>> >> > >> commits of 60000 (60 seconds), meaning
that you're going to
> >> have
> >> >>>> to
> >> >>>> >> > >> wait 1 minute for
> >> >>>> >> > >> docs to show up unless you explicitly
commit.
> >> >>>> >> > >>
> >> >>>> >> > >> You're throwing away all the caches
configured in
> >> solrconfig.xml
> >> >>>> more
> >> >>>> >> > >> than 3 times a second,
> >> >>>> >> > >> executing autowarming, etc, etc, etc....
> >> >>>> >> > >>
> >> >>>> >> > >> Changing these to longer intervals might
cure the problem,
> >> but if
> >> >>>> not
> >> >>>> >> > >> then, as Hoss would
> >> >>>> >> > >> say, "details matter". I suspect you're
also seeing
> >> "overlapping
> >> >>>> >> > >> searchers" warning messages
> >> >>>> >> > >> in your log, and it;s _possible_ that
what's happening is
> that
> >> >>>> you're
> >> >>>> >> > >> just exceeding the
> >> >>>> >> > >> max warming searchers and never opening
a new searcher with
> >> the
> >> >>>> >> > >> newly-indexed documents.
> >> >>>> >> > >> But that's a total shot in the dark.
> >> >>>> >> > >>
> >> >>>> >> > >> How are you looking for docs (and not
finding them)? Does
> the
> >> >>>> numDocs
> >> >>>> >> > >> number in
> >> >>>> >> > >> the solr admin screen change?
> >> >>>> >> > >>
> >> >>>> >> > >>
> >> >>>> >> > >> Best,
> >> >>>> >> > >> Erick
> >> >>>> >> > >>
> >> >>>> >> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin
Solanki <
> >> >>>> nitinmlvya@gmail.com
> >> >>>> >> >
> >> >>>> >> > >> wrote:
> >> >>>> >> > >> > Hi Alexandre,
> >> >>>> >> > >> >
> >> >>>> >> > >> >
> >> >>>> >> > >> > *Hard Commit* is :
> >> >>>> >> > >> >
> >> >>>> >> > >> >      <autoCommit>
> >> >>>> >> > >> >        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
> >> >>>> >> > >> >        <openSearcher>false</openSearcher>
> >> >>>> >> > >> >      </autoCommit>
> >> >>>> >> > >> >
> >> >>>> >> > >> > *Soft Commit* is :
> >> >>>> >> > >> >
> >> >>>> >> > >> > <autoSoftCommit>
> >> >>>> >> > >> >     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
> >> >>>> >> > >> > </autoSoftCommit>
> >> >>>> >> > >> >
> >> >>>> >> > >> > And I am committing 20000 documents
each time.
> >> >>>> >> > >> > Is it good config for committing?
> >> >>>> >> > >> > Or I am good something wrong ?
> >> >>>> >> > >> >
> >> >>>> >> > >> >
> >> >>>> >> > >> > On Fri, Mar 13, 2015 at 8:52 AM,
Alexandre Rafalovitch <
> >> >>>> >> > >> arafalov@gmail.com>
> >> >>>> >> > >> > wrote:
> >> >>>> >> > >> >
> >> >>>> >> > >> >> What's your commit strategy?
Explicit commits? Soft
> >> >>>> commits/hard
> >> >>>> >> > >> >> commits (in solrconfig.xml)?
> >> >>>> >> > >> >>
> >> >>>> >> > >> >> Regards,
> >> >>>> >> > >> >>    Alex.
> >> >>>> >> > >> >> ----
> >> >>>> >> > >> >> Solr Analyzers, Tokenizers,
Filters, URPs and even a
> >> >>>> newsletter:
> >> >>>> >> > >> >> http://www.solr-start.com/
> >> >>>> >> > >> >>
> >> >>>> >> > >> >>
> >> >>>> >> > >> >> On 12 March 2015 at 23:19,
Nitin Solanki <
> >> nitinmlvya@gmail.com
> >> >>>> >
> >> >>>> >> > wrote:
> >> >>>> >> > >> >> > Hello,
> >> >>>> >> > >> >> >           I have written
a python script to do 20000
> >> >>>> documents
> >> >>>> >> > >> indexing
> >> >>>> >> > >> >> > each time on Solr. I have
28 GB RAM with 8 CPU.
> >> >>>> >> > >> >> > When I started indexing,
at that time 15 GB RAM was
> >> freed.
> >> >>>> While
> >> >>>> >> > >> >> indexing,
> >> >>>> >> > >> >> > all RAM is consumed but
**not** a single document is
> >> >>>> indexed. Why
> >> >>>> >> > so?
> >> >>>> >> > >> >> > And it through *HTTPError:
HTTP Error 503: Service
> >> >>>> Unavailable*
> >> >>>> >> in
> >> >>>> >> > >> python
> >> >>>> >> > >> >> > script.
> >> >>>> >> > >> >> > I think it is due to heavy
load on Zookeeper by which
> all
> >> >>>> nodes
> >> >>>> >> > went
> >> >>>> >> > >> >> down.
> >> >>>> >> > >> >> > I am not sure about that.
Any help please..
> >> >>>> >> > >> >> > Or anything else is happening..
> >> >>>> >> > >> >> > And how to overcome this
issue.
> >> >>>> >> > >> >> > Please assist me towards
right path.
> >> >>>> >> > >> >> > Thanks..
> >> >>>> >> > >> >> >
> >> >>>> >> > >> >> > Warm Regards,
> >> >>>> >> > >> >> > Nitin Solanki
> >> >>>> >> > >> >>
> >> >>>> >> > >>
> >> >>>> >> >
> >> >>>> >>
> >> >>>>
> >> >>>
> >> >>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message