lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: parent/child rows in solr
Date Sat, 08 Sep 2018 01:59:38 GMT
On 9/7/2018 7:44 PM, John Smith wrote:
> Thanks Shawn, for your comments. The reason why I don't want to go flat
> file structure, is due to all the wasted/duplicated data. If a department
> has 100 employees, then it's very wasteful in terms of disk space to repeat
> the header data over and over again, 100 times. In this example there is
> only a few doc types, but my real-life data is much larger, and the problem
> is a "scaling" problem; with just a little bit of data, no problem in
> duplicating header fields, but with massive amounts of data it's a large
> problem.

If your goal is data storage, then you are completely correct.  All that 
data duplication is something to avoid for a data storage situation.  
Normalizing your data so it's relational makes perfect sense, because 
most database software is designed to efficiently deal with those 
relationships.

Solr is not designed as a data storage platform, and does not handle 
those relationships efficiently.  Solr's design goals are all about 
*search*.  It often gets touted as filling a NoSQL role ... but it's not 
something I would personally use as a primary data repository.  Search 
is a space where data duplication is expected and completely normal.  
This is something that people often have a hard time accepting.

Thanks,
Shawn


Mime
View raw message