lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Whole RAM consumed while Indexing.
Date Thu, 19 Mar 2015 20:05:43 GMT
That or even hard commit to 60 seconds. It's strictly a matter of how often
you want to close old segments and open new ones.

On Thu, Mar 19, 2015 at 3:12 AM, Nitin Solanki <nitinmlvya@gmail.com> wrote:
> Hi Erick..
>               I read your Article. Really nice...
> Inside that you said that for bulk indexing. Set soft commit = 10 mins and
> hard commit = 15sec. Is it also okay for my scenario?
>
> On Thu, Mar 19, 2015 at 1:53 AM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
>> bq: As you said, do commits after 60000 seconds
>>
>> No, No, No. I'm NOT saying 60000 seconds! That time is in _milliseconds_
>> as Shawn said. So setting it to 60000 is every minute.
>>
>> From solrconfig.xml, conveniently located immediately above the
>> <autoCommit> tag:
>>
>> maxTime - Maximum amount of time in ms that is allowed to pass since a
>> document was added before automatically triggering a new commit.
>>
>> Also, a lot of answers to soft and hard commits is here as I pointed
>> out before, did you read it?
>>
>>
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> Best
>> Erick
>>
>> On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch
>> <arafalov@gmail.com> wrote:
>> > Probably merged somewhat differently with some terms indexes repeating
>> > between segments. Check the number of segments in data directory.And
>> > do search for *:* and make sure both do have the same document counts.
>> >
>> > Also, In all these discussions, you still haven't answered about how
>> > fast after indexing you want to _search_? Because, if you are not
>> > actually searching while committing, you could even index on a
>> > completely separate server (e.g. a faster one) and swap (or alias)
>> > index in afterwards. Unless, of course, I missed it, it's a lot of
>> > emails in a very short window of time.
>> >
>> > Regards,
>> >    Alex.
>> >
>> > ----
>> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> > http://www.solr-start.com/
>> >
>> >
>> > On 18 March 2015 at 12:09, Nitin Solanki <nitinmlvya@gmail.com> wrote:
>> >> When I kept my configuration to 300 for soft commit and 3000 for hard
>> >> commit and indexed some amount of data, I got the data size of the whole
>> >> index to be 6GB after completing the indexing.
>> >>
>> >> When I changed the configuration to 60000 for soft commit and 60000 for
>> >> hard commit and indexed same data then I got the data size of the whole
>> >> index to be 5GB after completing the indexing.
>> >>
>> >> But the number of documents in the both scenario were same. I am
>> wondering
>> >> how that can be possible?
>> >>
>> >> On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki <nitinmlvya@gmail.com>
>> wrote:
>> >>
>> >>> Hi Erick,
>> >>>              I am just saying. I want to be sure on commits
>> difference..
>> >>> What if I do frequent commits or not? And why I am saying that I need
>> to
>> >>> commit things so very quickly because I have to index 28GB of data
>> which
>> >>> takes 7-8 hours(frequent commits).
>> >>> As you said, do commits after 60000 seconds then it will be more
>> expensive.
>> >>> If I don't encounter with **"overlapping searchers" warning messages**
>> >>> then I feel it seems to be okay. Is it?
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <
>> erickerickson@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> Don't do it. Really, why do you want to do this? This seems like
>> >>>> an "XY" problem, you haven't explained why you need to commit
>> >>>> things so very quickly.
>> >>>>
>> >>>> I suspect you haven't tried _searching_ while committing at such
>> >>>> a rate, and you might as well turn all your top-level caches off
>> >>>> in solrconfig.xml since they won't be useful at all.
>> >>>>
>> >>>> Best,
>> >>>> Erick
>> >>>>
>> >>>> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <nitinmlvya@gmail.com>
>> >>>> wrote:
>> >>>> > Hi,
>> >>>> >        If I do very very fast indexing(softcommit = 300 and
>> hardcommit =
>> >>>> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit
= 60000)
>> as
>> >>>> you
>> >>>> > both said. Will fast indexing fail to index some data?
>> >>>> > Any suggestion on this ?
>> >>>> >
>> >>>> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
>> >>>> > andyetitmoves@gmail.com> wrote:
>> >>>> >
>> >>>> >> Yes, and doing so is painful and takes lots of people and
hardware
>> >>>> >> resources to get there for large amounts of data and queries
:)
>> >>>> >>
>> >>>> >> As Erick says, work backwards from 60s and first establish
how
>> high the
>> >>>> >> commit interval can be to satisfy your use case..
>> >>>> >> On 16 Mar 2015 16:04, "Erick Erickson" <erickerickson@gmail.com>
>> >>>> wrote:
>> >>>> >>
>> >>>> >> > First start by lengthening your soft and hard commit
intervals
>> >>>> >> > substantially. Start with 60000 and work backwards
I'd say.
>> >>>> >> >
>> >>>> >> > Ramkumar has tuned the heck out of his installation
to get the
>> commit
>> >>>> >> > intervals to be that short ;).
>> >>>> >> >
>> >>>> >> > I'm betting that you'll see your RAM usage go way
down, but
>> that' s a
>> >>>> >> > guess until you test.
>> >>>> >> >
>> >>>> >> > Best,
>> >>>> >> > Erick
>> >>>> >> >
>> >>>> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <
>> >>>> nitinmlvya@gmail.com>
>> >>>> >> > wrote:
>> >>>> >> > > Hi Erick,
>> >>>> >> > >             You are saying correct. Something,
**"overlapping
>> >>>> >> searchers"
>> >>>> >> > > warning messages** are coming in logs.
>> >>>> >> > > **numDocs numbers** are changing when documents
are adding at
>> the
>> >>>> time
>> >>>> >> of
>> >>>> >> > > indexing.
>> >>>> >> > > Any help?
>> >>>> >> > >
>> >>>> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson
<
>> >>>> >> > erickerickson@gmail.com>
>> >>>> >> > > wrote:
>> >>>> >> > >
>> >>>> >> > >> First, the soft commit interval is very short.
Very, very,
>> very,
>> >>>> very
>> >>>> >> > >> short. 300ms is
>> >>>> >> > >> just short of insane unless it's a typo ;).
>> >>>> >> > >>
>> >>>> >> > >> Here's a long background:
>> >>>> >> > >>
>> >>>> >> > >>
>> >>>> >> >
>> >>>> >>
>> >>>>
>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>> >>>> >> > >>
>> >>>> >> > >> But the short form is that you're opening
searchers every 300
>> ms.
>> >>>> The
>> >>>> >> > >> hard commit is better,
>> >>>> >> > >> but every 3 seconds is still far too short
IMO. I'd start with
>> >>>> soft
>> >>>> >> > >> commits of 60000 and hard
>> >>>> >> > >> commits of 60000 (60 seconds), meaning that
you're going to
>> have
>> >>>> to
>> >>>> >> > >> wait 1 minute for
>> >>>> >> > >> docs to show up unless you explicitly commit.
>> >>>> >> > >>
>> >>>> >> > >> You're throwing away all the caches configured
in
>> solrconfig.xml
>> >>>> more
>> >>>> >> > >> than 3 times a second,
>> >>>> >> > >> executing autowarming, etc, etc, etc....
>> >>>> >> > >>
>> >>>> >> > >> Changing these to longer intervals might
cure the problem,
>> but if
>> >>>> not
>> >>>> >> > >> then, as Hoss would
>> >>>> >> > >> say, "details matter". I suspect you're also
seeing
>> "overlapping
>> >>>> >> > >> searchers" warning messages
>> >>>> >> > >> in your log, and it;s _possible_ that what's
happening is that
>> >>>> you're
>> >>>> >> > >> just exceeding the
>> >>>> >> > >> max warming searchers and never opening a
new searcher with
>> the
>> >>>> >> > >> newly-indexed documents.
>> >>>> >> > >> But that's a total shot in the dark.
>> >>>> >> > >>
>> >>>> >> > >> How are you looking for docs (and not finding
them)? Does the
>> >>>> numDocs
>> >>>> >> > >> number in
>> >>>> >> > >> the solr admin screen change?
>> >>>> >> > >>
>> >>>> >> > >>
>> >>>> >> > >> Best,
>> >>>> >> > >> Erick
>> >>>> >> > >>
>> >>>> >> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki
<
>> >>>> nitinmlvya@gmail.com
>> >>>> >> >
>> >>>> >> > >> wrote:
>> >>>> >> > >> > Hi Alexandre,
>> >>>> >> > >> >
>> >>>> >> > >> >
>> >>>> >> > >> > *Hard Commit* is :
>> >>>> >> > >> >
>> >>>> >> > >> >      <autoCommit>
>> >>>> >> > >> >        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
>> >>>> >> > >> >        <openSearcher>false</openSearcher>
>> >>>> >> > >> >      </autoCommit>
>> >>>> >> > >> >
>> >>>> >> > >> > *Soft Commit* is :
>> >>>> >> > >> >
>> >>>> >> > >> > <autoSoftCommit>
>> >>>> >> > >> >     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
>> >>>> >> > >> > </autoSoftCommit>
>> >>>> >> > >> >
>> >>>> >> > >> > And I am committing 20000 documents
each time.
>> >>>> >> > >> > Is it good config for committing?
>> >>>> >> > >> > Or I am good something wrong ?
>> >>>> >> > >> >
>> >>>> >> > >> >
>> >>>> >> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre
Rafalovitch <
>> >>>> >> > >> arafalov@gmail.com>
>> >>>> >> > >> > wrote:
>> >>>> >> > >> >
>> >>>> >> > >> >> What's your commit strategy? Explicit
commits? Soft
>> >>>> commits/hard
>> >>>> >> > >> >> commits (in solrconfig.xml)?
>> >>>> >> > >> >>
>> >>>> >> > >> >> Regards,
>> >>>> >> > >> >>    Alex.
>> >>>> >> > >> >> ----
>> >>>> >> > >> >> Solr Analyzers, Tokenizers, Filters,
URPs and even a
>> >>>> newsletter:
>> >>>> >> > >> >> http://www.solr-start.com/
>> >>>> >> > >> >>
>> >>>> >> > >> >>
>> >>>> >> > >> >> On 12 March 2015 at 23:19, Nitin
Solanki <
>> nitinmlvya@gmail.com
>> >>>> >
>> >>>> >> > wrote:
>> >>>> >> > >> >> > Hello,
>> >>>> >> > >> >> >           I have written a
python script to do 20000
>> >>>> documents
>> >>>> >> > >> indexing
>> >>>> >> > >> >> > each time on Solr. I have 28
GB RAM with 8 CPU.
>> >>>> >> > >> >> > When I started indexing, at
that time 15 GB RAM was
>> freed.
>> >>>> While
>> >>>> >> > >> >> indexing,
>> >>>> >> > >> >> > all RAM is consumed but **not**
a single document is
>> >>>> indexed. Why
>> >>>> >> > so?
>> >>>> >> > >> >> > And it through *HTTPError:
HTTP Error 503: Service
>> >>>> Unavailable*
>> >>>> >> in
>> >>>> >> > >> python
>> >>>> >> > >> >> > script.
>> >>>> >> > >> >> > I think it is due to heavy
load on Zookeeper by which all
>> >>>> nodes
>> >>>> >> > went
>> >>>> >> > >> >> down.
>> >>>> >> > >> >> > I am not sure about that. Any
help please..
>> >>>> >> > >> >> > Or anything else is happening..
>> >>>> >> > >> >> > And how to overcome this issue.
>> >>>> >> > >> >> > Please assist me towards right
path.
>> >>>> >> > >> >> > Thanks..
>> >>>> >> > >> >> >
>> >>>> >> > >> >> > Warm Regards,
>> >>>> >> > >> >> > Nitin Solanki
>> >>>> >> > >> >>
>> >>>> >> > >>
>> >>>> >> >
>> >>>> >>
>> >>>>
>> >>>
>> >>>
>>

Mime
View raw message