lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: parent/child rows in solr
Date Wed, 12 Sep 2018 01:32:30 GMT
On 9/11/2018 7:07 PM, John Smith wrote:
> header:      223,580
> child1:      124,978
> child2:      254,045
> child3:      127,917
> child4:    1,009,030
> child5:      225,311
> child6:      381,561
> child7:      438,315
> child8:       18,850
> Trying to index that into solr with a flatfile schema, blows up into
> 5,475,316,072 rows. Yes, 5.5 billion rows. I calculated that by running a

I think you're not getting what I'm suggesting.  Or maybe there's an 
aspect of your data that I'm not understanding.

If we add up all those numbers for the child docs, there are 2.5 million 
of them.  So you would have 2.5 million docs in Solr.  I have created 
Solr indexes far larger than this, and I do not consider my work to be 
"big data".  Solr can handle 2.5 million docs easily, as long as the 
hardware resources are sufficient.

Where the data duplication will come in is in additional fields in those 
2.5 million docs.  Each one will contain some (or maybe all) of the data 
that WOULD have been in the parent document.  The amount of data 
balloons, but the number of documents (rows) doesn't.

That kind of arrangement is usually enough to accomplish whatever is 
needed.  I cannot assume that it will work for your use case, but it 
does work for most.


View raw message