lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From adfel70 <adfe...@gmail.com>
Subject Converting nested data model to solr schema
Date Mon, 01 Jul 2013 13:56:07 GMT
Hi,
I have the following data model:
1. Document (fields: doc_id, author, content)
2. Each Document has multiple  attachment types. Each attachment type has
multiple instances. And each attachment type may have different fields.
for example:
<doc>
   <doc_id>1</doc_id>
   <author>john</author>
   <content>some long long text...</content>
   <file_attachments>
      <file_attachment>
         <attach_id>458</attach_id>
         <attach_text>SomeText</attach_text>
         <attach_date>12/12/2012</attach_date>
      </file_attachment>
      <file_attachment>
         <attach_id>568</attach_id>
         <attach_text>SomeText2</attach_text>
         <attach_date>12/11/2012</attach_date>
      </file_attachment>
   </file_attachments>
   <reply_attachments>
      <reply_attachment>
         <reply_id>345</reply_id>
         <reply_text>SomeText</reply_text>
         <reply_author>Jack</reply_author>
         <reply_date>22-12-2012</reply_date>
      </reply_attachment>
      <reply_attachment>
         <reply_id>897</attach_id>
         <reply_text>SomeText2</reply_text>
         <reply_author>Bob</reply_author>
         <reply_date>23-12-2012</reply_date>
      </reply_attachment>
   </reply_attachments>


I want to index all this data in solr cloud.
My current solution is to index the original document by its self and index
each attachment as a single solr document with its parent_doc_id, and then
use solr join capability.
The problem with this solution is  that I must index all the attachments of
each document, and the document itself in the same shard (current solr
limitation).
This requires me to override the solr document distribution mechanism.
I fear that with this solution I may loose some of solr cloud's
capabilities.
My questions are:
1. Are my concerns regarding downside of overriding solr cloud's
out-of-the-box mechanism justified? Or should I proceed with this solution?
2. If I'm looking for another solution, can I  somehow keep all attachments
on the same document and be able to query on a single attachment?
A query example:
Retrieve  all documents where:
content: contains "abc"
AND
reply_attachment.author = 'Bob'
AND
reply_attachment.date = '12-12-2012'


Thanks.



--
View this message in context: http://lucene.472066.n3.nabble.com/Converting-nested-data-model-to-solr-schema-tp4074351.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message