lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Converting nested data model to solr schema
Date Mon, 01 Jul 2013 14:07:57 GMT
Simply duplicate a subset of the fields that you want to query of the parent 
document on each child document and then you can directly query the child 
documents without any join.

Yes, given the complexity of your data, a two-step query process may be 
necessary for some queries - do one query to get parent or child IDs and 
then do a second query filtered by those IDs.

And, yes, this only approximates the full power of an SQL join - but at a 
tiny fraction of the cost.

-- Jack Krupansky

-----Original Message----- 
From: adfel70
Sent: Monday, July 01, 2013 9:56 AM
To: solr-user@lucene.apache.org
Subject: Converting nested data model to solr schema

Hi,
I have the following data model:
1. Document (fields: doc_id, author, content)
2. Each Document has multiple  attachment types. Each attachment type has
multiple instances. And each attachment type may have different fields.
for example:
<doc>
   <doc_id>1</doc_id>
   <author>john</author>
   <content>some long long text...</content>
   <file_attachments>
      <file_attachment>
         <attach_id>458</attach_id>
         <attach_text>SomeText</attach_text>
         <attach_date>12/12/2012</attach_date>
      </file_attachment>
      <file_attachment>
         <attach_id>568</attach_id>
         <attach_text>SomeText2</attach_text>
         <attach_date>12/11/2012</attach_date>
      </file_attachment>
   </file_attachments>
   <reply_attachments>
      <reply_attachment>
         <reply_id>345</reply_id>
         <reply_text>SomeText</reply_text>
         <reply_author>Jack</reply_author>
         <reply_date>22-12-2012</reply_date>
      </reply_attachment>
      <reply_attachment>
         <reply_id>897</attach_id>
         <reply_text>SomeText2</reply_text>
         <reply_author>Bob</reply_author>
         <reply_date>23-12-2012</reply_date>
      </reply_attachment>
   </reply_attachments>


I want to index all this data in solr cloud.
My current solution is to index the original document by its self and index
each attachment as a single solr document with its parent_doc_id, and then
use solr join capability.
The problem with this solution is  that I must index all the attachments of
each document, and the document itself in the same shard (current solr
limitation).
This requires me to override the solr document distribution mechanism.
I fear that with this solution I may loose some of solr cloud's
capabilities.
My questions are:
1. Are my concerns regarding downside of overriding solr cloud's
out-of-the-box mechanism justified? Or should I proceed with this solution?
2. If I'm looking for another solution, can I  somehow keep all attachments
on the same document and be able to query on a single attachment?
A query example:
Retrieve  all documents where:
content: contains "abc"
AND
reply_attachment.author = 'Bob'
AND
reply_attachment.date = '12-12-2012'


Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Mime
View raw message