lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Whole RAM consumed while Indexing.
Date Wed, 18 Mar 2015 20:23:36 GMT
bq: As you said, do commits after 60000 seconds

No, No, No. I'm NOT saying 60000 seconds! That time is in _milliseconds_
as Shawn said. So setting it to 60000 is every minute.

>From solrconfig.xml, conveniently located immediately above the
<autoCommit> tag:

maxTime - Maximum amount of time in ms that is allowed to pass since a
document was added before automatically triggering a new commit.

Also, a lot of answers to soft and hard commits is here as I pointed
out before, did you read it?

https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best
Erick

On Wed, Mar 18, 2015 at 9:44 AM, Alexandre Rafalovitch
<arafalov@gmail.com> wrote:
> Probably merged somewhat differently with some terms indexes repeating
> between segments. Check the number of segments in data directory.And
> do search for *:* and make sure both do have the same document counts.
>
> Also, In all these discussions, you still haven't answered about how
> fast after indexing you want to _search_? Because, if you are not
> actually searching while committing, you could even index on a
> completely separate server (e.g. a faster one) and swap (or alias)
> index in afterwards. Unless, of course, I missed it, it's a lot of
> emails in a very short window of time.
>
> Regards,
>    Alex.
>
> ----
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 18 March 2015 at 12:09, Nitin Solanki <nitinmlvya@gmail.com> wrote:
>> When I kept my configuration to 300 for soft commit and 3000 for hard
>> commit and indexed some amount of data, I got the data size of the whole
>> index to be 6GB after completing the indexing.
>>
>> When I changed the configuration to 60000 for soft commit and 60000 for
>> hard commit and indexed same data then I got the data size of the whole
>> index to be 5GB after completing the indexing.
>>
>> But the number of documents in the both scenario were same. I am wondering
>> how that can be possible?
>>
>> On Wed, Mar 18, 2015 at 9:14 PM, Nitin Solanki <nitinmlvya@gmail.com> wrote:
>>
>>> Hi Erick,
>>>              I am just saying. I want to be sure on commits difference..
>>> What if I do frequent commits or not? And why I am saying that I need to
>>> commit things so very quickly because I have to index 28GB of data which
>>> takes 7-8 hours(frequent commits).
>>> As you said, do commits after 60000 seconds then it will be more expensive.
>>> If I don't encounter with **"overlapping searchers" warning messages**
>>> then I feel it seems to be okay. Is it?
>>>
>>>
>>>
>>>
>>> On Wed, Mar 18, 2015 at 8:54 PM, Erick Erickson <erickerickson@gmail.com>
>>> wrote:
>>>
>>>> Don't do it. Really, why do you want to do this? This seems like
>>>> an "XY" problem, you haven't explained why you need to commit
>>>> things so very quickly.
>>>>
>>>> I suspect you haven't tried _searching_ while committing at such
>>>> a rate, and you might as well turn all your top-level caches off
>>>> in solrconfig.xml since they won't be useful at all.
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Wed, Mar 18, 2015 at 6:24 AM, Nitin Solanki <nitinmlvya@gmail.com>
>>>> wrote:
>>>> > Hi,
>>>> >        If I do very very fast indexing(softcommit = 300 and hardcommit
=
>>>> > 3000) v/s slow indexing (softcommit = 60000 and hardcommit = 60000)
as
>>>> you
>>>> > both said. Will fast indexing fail to index some data?
>>>> > Any suggestion on this ?
>>>> >
>>>> > On Tue, Mar 17, 2015 at 2:29 AM, Ramkumar R. Aiyengar <
>>>> > andyetitmoves@gmail.com> wrote:
>>>> >
>>>> >> Yes, and doing so is painful and takes lots of people and hardware
>>>> >> resources to get there for large amounts of data and queries :)
>>>> >>
>>>> >> As Erick says, work backwards from 60s and first establish how high
the
>>>> >> commit interval can be to satisfy your use case..
>>>> >> On 16 Mar 2015 16:04, "Erick Erickson" <erickerickson@gmail.com>
>>>> wrote:
>>>> >>
>>>> >> > First start by lengthening your soft and hard commit intervals
>>>> >> > substantially. Start with 60000 and work backwards I'd say.
>>>> >> >
>>>> >> > Ramkumar has tuned the heck out of his installation to get
the commit
>>>> >> > intervals to be that short ;).
>>>> >> >
>>>> >> > I'm betting that you'll see your RAM usage go way down, but
that' s a
>>>> >> > guess until you test.
>>>> >> >
>>>> >> > Best,
>>>> >> > Erick
>>>> >> >
>>>> >> > On Sun, Mar 15, 2015 at 10:56 PM, Nitin Solanki <
>>>> nitinmlvya@gmail.com>
>>>> >> > wrote:
>>>> >> > > Hi Erick,
>>>> >> > >             You are saying correct. Something, **"overlapping
>>>> >> searchers"
>>>> >> > > warning messages** are coming in logs.
>>>> >> > > **numDocs numbers** are changing when documents are adding
at the
>>>> time
>>>> >> of
>>>> >> > > indexing.
>>>> >> > > Any help?
>>>> >> > >
>>>> >> > > On Sat, Mar 14, 2015 at 11:24 PM, Erick Erickson <
>>>> >> > erickerickson@gmail.com>
>>>> >> > > wrote:
>>>> >> > >
>>>> >> > >> First, the soft commit interval is very short. Very,
very, very,
>>>> very
>>>> >> > >> short. 300ms is
>>>> >> > >> just short of insane unless it's a typo ;).
>>>> >> > >>
>>>> >> > >> Here's a long background:
>>>> >> > >>
>>>> >> > >>
>>>> >> >
>>>> >>
>>>> https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>>> >> > >>
>>>> >> > >> But the short form is that you're opening searchers
every 300 ms.
>>>> The
>>>> >> > >> hard commit is better,
>>>> >> > >> but every 3 seconds is still far too short IMO. I'd
start with
>>>> soft
>>>> >> > >> commits of 60000 and hard
>>>> >> > >> commits of 60000 (60 seconds), meaning that you're
going to have
>>>> to
>>>> >> > >> wait 1 minute for
>>>> >> > >> docs to show up unless you explicitly commit.
>>>> >> > >>
>>>> >> > >> You're throwing away all the caches configured in
solrconfig.xml
>>>> more
>>>> >> > >> than 3 times a second,
>>>> >> > >> executing autowarming, etc, etc, etc....
>>>> >> > >>
>>>> >> > >> Changing these to longer intervals might cure the
problem, but if
>>>> not
>>>> >> > >> then, as Hoss would
>>>> >> > >> say, "details matter". I suspect you're also seeing
"overlapping
>>>> >> > >> searchers" warning messages
>>>> >> > >> in your log, and it;s _possible_ that what's happening
is that
>>>> you're
>>>> >> > >> just exceeding the
>>>> >> > >> max warming searchers and never opening a new searcher
with the
>>>> >> > >> newly-indexed documents.
>>>> >> > >> But that's a total shot in the dark.
>>>> >> > >>
>>>> >> > >> How are you looking for docs (and not finding them)?
Does the
>>>> numDocs
>>>> >> > >> number in
>>>> >> > >> the solr admin screen change?
>>>> >> > >>
>>>> >> > >>
>>>> >> > >> Best,
>>>> >> > >> Erick
>>>> >> > >>
>>>> >> > >> On Thu, Mar 12, 2015 at 10:27 PM, Nitin Solanki <
>>>> nitinmlvya@gmail.com
>>>> >> >
>>>> >> > >> wrote:
>>>> >> > >> > Hi Alexandre,
>>>> >> > >> >
>>>> >> > >> >
>>>> >> > >> > *Hard Commit* is :
>>>> >> > >> >
>>>> >> > >> >      <autoCommit>
>>>> >> > >> >        <maxTime>${solr.autoCommit.maxTime:3000}</maxTime>
>>>> >> > >> >        <openSearcher>false</openSearcher>
>>>> >> > >> >      </autoCommit>
>>>> >> > >> >
>>>> >> > >> > *Soft Commit* is :
>>>> >> > >> >
>>>> >> > >> > <autoSoftCommit>
>>>> >> > >> >     <maxTime>${solr.autoSoftCommit.maxTime:300}</maxTime>
>>>> >> > >> > </autoSoftCommit>
>>>> >> > >> >
>>>> >> > >> > And I am committing 20000 documents each time.
>>>> >> > >> > Is it good config for committing?
>>>> >> > >> > Or I am good something wrong ?
>>>> >> > >> >
>>>> >> > >> >
>>>> >> > >> > On Fri, Mar 13, 2015 at 8:52 AM, Alexandre Rafalovitch
<
>>>> >> > >> arafalov@gmail.com>
>>>> >> > >> > wrote:
>>>> >> > >> >
>>>> >> > >> >> What's your commit strategy? Explicit commits?
Soft
>>>> commits/hard
>>>> >> > >> >> commits (in solrconfig.xml)?
>>>> >> > >> >>
>>>> >> > >> >> Regards,
>>>> >> > >> >>    Alex.
>>>> >> > >> >> ----
>>>> >> > >> >> Solr Analyzers, Tokenizers, Filters, URPs
and even a
>>>> newsletter:
>>>> >> > >> >> http://www.solr-start.com/
>>>> >> > >> >>
>>>> >> > >> >>
>>>> >> > >> >> On 12 March 2015 at 23:19, Nitin Solanki
<nitinmlvya@gmail.com
>>>> >
>>>> >> > wrote:
>>>> >> > >> >> > Hello,
>>>> >> > >> >> >           I have written a python script
to do 20000
>>>> documents
>>>> >> > >> indexing
>>>> >> > >> >> > each time on Solr. I have 28 GB RAM
with 8 CPU.
>>>> >> > >> >> > When I started indexing, at that time
15 GB RAM was freed.
>>>> While
>>>> >> > >> >> indexing,
>>>> >> > >> >> > all RAM is consumed but **not** a single
document is
>>>> indexed. Why
>>>> >> > so?
>>>> >> > >> >> > And it through *HTTPError: HTTP Error
503: Service
>>>> Unavailable*
>>>> >> in
>>>> >> > >> python
>>>> >> > >> >> > script.
>>>> >> > >> >> > I think it is due to heavy load on Zookeeper
by which all
>>>> nodes
>>>> >> > went
>>>> >> > >> >> down.
>>>> >> > >> >> > I am not sure about that. Any help please..
>>>> >> > >> >> > Or anything else is happening..
>>>> >> > >> >> > And how to overcome this issue.
>>>> >> > >> >> > Please assist me towards right path.
>>>> >> > >> >> > Thanks..
>>>> >> > >> >> >
>>>> >> > >> >> > Warm Regards,
>>>> >> > >> >> > Nitin Solanki
>>>> >> > >> >>
>>>> >> > >>
>>>> >> >
>>>> >>
>>>>
>>>
>>>

Mime
View raw message