lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fran├žois Schiettecatte <>
Subject Re: Solr indexing size for a particular document.
Date Tue, 19 Apr 2011 13:11:24 GMT
I think you could approximate this with some empirical measurements, i.e. index 1,000 'typical'
documents and see what the resulting index size it. Of course you may need to adjust this
number upwards if there is a lot of variability in document size. 

When I built the search engine that ran feedster, I noticed there was a 1:1 correlation between
the size of the source documents and the index produces, 1M documents produced 1GB of source
text which in turn produced 1GB of index. That was useful to me in determining the number
of documents to put in each shard (1M) as documents were crawled and indexed.


On Apr 19, 2011, at 8:28 AM, Erick Erickson wrote:

> There's no way I know of to do this.
> Why is this important to you? Because I'm not
> sure what actionable information this gives you.
> The number will vary based on whether the fields
> are stored or not. And storing the fields has
> very little effect on search memory requirements.
> What are you hoping to do with that information?
> Maybe we can suggest a better approach if you
> state the higher-level problem...
> Best
> Erick
> On Tue, Apr 19, 2011 at 7:49 AM, rahul <> wrote:
>> Hi,
>> Is there a way to find out Solr indexing size for a particular document. I
>> am using Solrj to index the documents.
>> Assume, I am indexing multiple fields like title, description, content, and
>> few integer fields in schema.xml, then once I index the content, is there a
>> way to identify the index size for the particular document during indexing
>> or after indexing..??
>> Because, most of the common words are excluded from StopWords.txt using
>> StopFilterFactory. I just want to calculate the actual index size of the
>> particular document. Is there any way in current Solr ??
>> thanks,
>> --
>> View this message in context:
>> Sent from the Solr - User mailing list archive at

View raw message