lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Leir <rl...@leirtech.com>
Subject Re: 1 main collection or multiple smaller collections?
Date Fri, 28 Apr 2017 11:53:05 GMT
Derek
You could have one document per supplier which has no product info. It would have a flag to
indicate this. Then your supplier search is simple. 

But grouping would be better, so the supplier search can show product counts and categories
and ...

+1 Walter on designing back from the results page. That is from the NoSQL playbook.
Cheers -- Rick

On April 27, 2017 10:34:56 PM EDT, Derek Poh <dpoh@globalsources.com> wrote:
>Walter
>
>Thank you for sharing your use case. I will try to design backwards
>from 
>the search result pages.
>As of now user can either do a supplier search or a product.search.
>Using 1single collection of products documents, with supplier info in 
>each product document, for supplier search, I will need to use grouping
>
>result or collapse parser.
>
>On 4/28/2017 1:08 AM, Walter Underwood wrote:
>> Design backwards from the search result pages (SRP). Make flat
>schema(s) with the fields you will search and display.
>>
>> One example is the schema I used at Netflix. I used one collection to
>hold movies, people (actors), and genres. There were collisions between
>the integer IDs, movies IDs were prefixed with “m”, people with “p”,
>and genres with “g”. The searched fields were “title” and
>“description”. There was also a “type” field which was “movie”,
>“person”, or “genre”. There was a also a field for the database ID
>(without the prefix).
>>
>> A movie SRP used an “fq” filter of “type:movie”, and so on for other
>SRPs. There were a few other filters, like G-rated movies or streaming,
>DVD, HD DVD, or Bluray.
>>
>> The full index was under 350K documents.
>>
>> wunder
>> Walter Underwood
>> wunder@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>>
>>> On Apr 27, 2017, at 10:01 AM, Rick Leir <rleir@leirtech.com> wrote:
>>>
>>> Does it make sense to use nested documents here? Products could be
>nested in a supplier document perhaps.
>>>
>>> Alternately, consider de-normalizing "til it hurts". A product doc
>might be able to contain supplier info.
>>>
>>> On April 27, 2017 8:50:59 AM EDT, Shawn Heisey <apache@elyograg.org>
>wrote:
>>>> On 4/26/2017 11:57 PM, Derek Poh wrote:
>>>>> There are some common fields between them.
>>>>> At the source data end (database), the supplier info and product
>info
>>>>> are updated separately. In this regard, I should separate them?
>>>>> If it's In 1 single collection, when there are updatesto only the
>>>>> supplier info,the product info will be index again even though
>there
>>>>> is noupdates to them, Is my reasoning valid?
>>>>>
>>>>>
>>>>> On 4/27/2017 1:33 PM, Walter Underwood wrote:
>>>>>> Do they have the same fields or different fields? Are they
>updated
>>>>>> separately or together?
>>>>>>
>>>>>> If they have the same fields and are updated together, I’d put
>them
>>>>>> in the same collection. Otherwise, probably separate.
>>>> Walter's statements are right on the money, you just might need a
>>>> little
>>>> more detail.
>>>>
>>>> There are are two critical details that decide whether you even CAN
>>>> combine different data in a single index: One is that all types of
>>>> records must use the same field (the uniqueKey field) to determine
>>>> uniqueness, and the value of this field must be unique across the
>>>> entire
>>>> dataset.  The other is that there SHOULD be a field with a name
>like
>>>> "type" that your search client can use to differentiate the
>different
>>>> kinds of documents.  This type field is not necessary, but it does
>make
>>>> things easier.
>>>>
>>>> Assuming you CAN combine documents, there is still the question of
>>>> whether you SHOULD.  If the fields that you will commonly search
>are
>>>> the
>>>> same between the different kinds of documents, and if people want
>to be
>>>> able to do one search and get more than one of the document types
>you
>>>> are indexing, then it is something you should consider.  If people
>will
>>>> only ever search one type of document, you should probably keep
>them in
>>>> separate indexes to keep things cleaner.
>>>>
>>>> Thanks,
>>>> Shawn
>>> -- 
>>> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>>
>
>
>----------------------
>CONFIDENTIALITY NOTICE 
>
>This e-mail (including any attachments) may contain confidential and/or
>privileged information. If you are not the intended recipient or have
>received this e-mail in error, please inform the sender immediately and
>delete this e-mail (including any attachments) from your computer, and
>you must not use, disclose to anyone else or copy this e-mail
>(including any attachments), whether in whole or in part. 
>
>This e-mail and any reply to it may be monitored for security, legal,
>regulatory compliance and/or other appropriate reasons.

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 
Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message