lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrea Gazzarini <a.gazzar...@sease.io>
Subject Re: Solr Index Size after reindex
Date Sat, 09 Feb 2019 15:55:55 GMT
Yes, those numbers are different and that should explain the different 
size. I think you should be able to find some information in the 
Alfresco or Solr log. There must be a reason about the missing content. 
For example, are those numbers coming from two comparable snapshots? In 
other words, I imagine that at a given moment X you rsync-ed the two servers

  * 5.365.213 is the numDocs you got just after the sync, isn't it?
  * 4.537.651 is the numDocs you got in the staging server after the
    reindexing isn't it? Are you sure the whole reindexing is completed?

MaxDocs is the number of documents you have in the index including the 
deleted docs not yet cleared by a merge. In the console you should also 
see the "Deleted docs" count which should be equal to (maxdocs - numdocs)

Ciao

Andrea

On 08/02/2019 15:53, Mathieu Menard wrote:
>
> Hi Andrea,
>
> I’ve checked this information and here is the result:
>
> 	
>
> PRODUCTION
>
> 	
>
> STAGING
>
> *numDocs*
>
> 	
>
> 5.365.213
>
> 	
>
> 4.537.651
>
> *MaxDoc*
>
> 	
>
> 5.845.469
>
> 	
>
> 5.129.556
>
> It seems that there is more than 800.00 docs in PRODUCTION that will 
> explain the size of indexes more important. But there is a thing that 
> I don’t understand, we have copied the DB and the contenstore the 
> numDocs for the two environments should be the same no?
>
> Could you also explain me the meaning of the maxDocs value pleases?
>
> Thanks
>
> Matthieu
>
> *From:*Andrea Gazzarini [mailto:a.gazzarini@sease.io]
> *Sent:* vendredi 8 février 2019 14:54
> *To:* solr-user@lucene.apache.org
> *Subject:* Re: Solr Index Size after reindex
>
> Hi Mathieu,
> what about the docs in the two infrastructures? Do they have the same 
> numbers (numdocs / maxdocs)? Any meaningful message (error or not) in 
> log files?
>
> Andrea
>
> On 08/02/2019 14:19, Mathieu Menard wrote:
>
>     Hello,
>
>     I would like to have your point of view about an observation we
>     have made on our two alfresco install (Production and Staging
>     environment) and more specifically on the size of our solr indexes
>     on these two environments.
>
>     Regularly we do a rsync between the Production and the Staging
>     environment, we make a copy of the Alfresco’s DB and a copy of the
>     entire contenstore after that we reindex all the alfresco content.
>
>     We have noticed that for the production environment we have 19 Gb
>     of indexes while in the staging we have “only” 11. Gb of indexes.
>     We have some difficulties to understand this difference because we
>     assume that the indexes optimization in the same for a full
>     reindex or for the normal use of solr.
>
>     I’ve verified the configuration between the two solr instances and
>     I don’t see any differences could you help me to better understand
>      this phenomenon.
>
>     Here you can find some information about our two environment, if
>     you need more details, I will give you as soon as possible:
>
>     	
>
>     PRODUCTION
>
>     	
>
>     STAGING
>
>     Alfresco version
>
>     	
>
>     5.1.1.4
>
>     	
>
>     5.1.1.4
>
>     Solr Version
>
>     	
>
>     	
>
>     Java version
>
>     	
>
>     	
>
>     Linux Machine
>
>     	
>
>     See Staging_caracteristics.txt file in attachment
>
>     	
>
>     See Staging_caracteristics.txt file in attachment
>
>     Please let me know if you any other information I will sent it to
>     you rapidly.
>
>     Kind Regards
>
>     Matthieu
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message