lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Derek Poh <>
Subject Re: 1 main collection or multiple smaller collections?
Date Fri, 28 Apr 2017 02:34:56 GMT

Thank you for sharing your use case. I will try to design backwards from 
the search result pages.
As of now user can either do a supplier search or a
Using 1single collection of products documents, with supplier info in 
each product document, for supplier search, I will need to use grouping 
result or collapse parser.

On 4/28/2017 1:08 AM, Walter Underwood wrote:
> Design backwards from the search result pages (SRP). Make flat schema(s) with the fields
you will search and display.
> One example is the schema I used at Netflix. I used one collection to hold movies, people
(actors), and genres. There were collisions between the integer IDs, movies IDs were prefixed
with “m”, people with “p”, and genres with “g”. The searched fields were “title”
and “description”. There was also a “type” field which was “movie”, “person”,
or “genre”. There was a also a field for the database ID (without the prefix).
> A movie SRP used an “fq” filter of “type:movie”, and so on for other SRPs. There
were a few other filters, like G-rated movies or streaming, DVD, HD DVD, or Bluray.
> The full index was under 350K documents.
> wunder
> Walter Underwood
>  (my blog)
>> On Apr 27, 2017, at 10:01 AM, Rick Leir <> wrote:
>> Does it make sense to use nested documents here? Products could be nested in a supplier
document perhaps.
>> Alternately, consider de-normalizing "til it hurts". A product doc might be able
to contain supplier info.
>> On April 27, 2017 8:50:59 AM EDT, Shawn Heisey <> wrote:
>>> On 4/26/2017 11:57 PM, Derek Poh wrote:
>>>> There are some common fields between them.
>>>> At the source data end (database), the supplier info and product info
>>>> are updated separately. In this regard, I should separate them?
>>>> If it's In 1 single collection, when there are updatesto only the
>>>> supplier info,the product info will be index again even though there
>>>> is noupdates to them, Is my reasoning valid?
>>>> On 4/27/2017 1:33 PM, Walter Underwood wrote:
>>>>> Do they have the same fields or different fields? Are they updated
>>>>> separately or together?
>>>>> If they have the same fields and are updated together, I’d put them
>>>>> in the same collection. Otherwise, probably separate.
>>> Walter's statements are right on the money, you just might need a
>>> little
>>> more detail.
>>> There are are two critical details that decide whether you even CAN
>>> combine different data in a single index: One is that all types of
>>> records must use the same field (the uniqueKey field) to determine
>>> uniqueness, and the value of this field must be unique across the
>>> entire
>>> dataset.  The other is that there SHOULD be a field with a name like
>>> "type" that your search client can use to differentiate the different
>>> kinds of documents.  This type field is not necessary, but it does make
>>> things easier.
>>> Assuming you CAN combine documents, there is still the question of
>>> whether you SHOULD.  If the fields that you will commonly search are
>>> the
>>> same between the different kinds of documents, and if people want to be
>>> able to do one search and get more than one of the document types you
>>> are indexing, then it is something you should consider.  If people will
>>> only ever search one type of document, you should probably keep them in
>>> separate indexes to keep things cleaner.
>>> Thanks,
>>> Shawn
>> -- 
>> Sorry for being brief. Alternate email is rickleir at yahoo dot com


This e-mail (including any attachments) may contain confidential and/or privileged information.
If you are not the intended recipient or have received this e-mail in error, please inform
the sender immediately and delete this e-mail (including any attachments) from your computer,
and you must not use, disclose to anyone else or copy this e-mail (including any attachments),
whether in whole or in part. 

This e-mail and any reply to it may be monitored for security, legal, regulatory compliance
and/or other appropriate reasons.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message