lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@safaribooksonline.com>
Subject Re: converting to parent/child block indexing
Date Wed, 17 Dec 2014 22:32:18 GMT
Thanks, Mikhail!  That explains the situation pretty well.

-Mike

On 12/17/14 4:49 PM, Mikhail Khludnev wrote:
> Hm.. really sorry about that. The current implementation is not really
> ideal, you know.
> When handles update it tries to recognize whether it block or not and in
> fact it uses _root_ field to enforce uniqueness. There are few consequences:
>   -  _root_ field spans whole block, not the parent one
>   - current heuristic (block/not-block) is straightforward and doesn't
> support flipping. Index is either blocked or it's not blocked. Your case is
> opposite to https://issues.apache.org/jira/browse/SOLR-5211
> as a workaround, before you send a block with id:66, send deleteQuery for
> this id. Wiping the index also an option for sure!
> Note, this might be an other intriguing approach
> http://shaierera.blogspot.com/2013/04/index-sorting-with-lucene.html
>
>
> On Thu, Dec 18, 2014 at 12:33 AM, Michael Sokolov <
> msokolov@safaribooksonline.com> wrote:
>> Have other people tried migrating an index that was created without block
>> (parent/child) indexing to one that *does* have it?  Did you find that you
>> got duplicate documents - ie multiple documents with the same uniqueField
>> value?  That's what I found, and I don't see how that's possible.
>>
>> What I *think* happened was:
>>
>> Before:
>>
>> I had various documents in the database, the unique key field was all set
>> up correctly, when I reindexed, documents would overwrite the existing
>> document (delete, then update, I guess).
>>
>> I changed my indexer (this is using a customized version of haystack) to
>> submit nested document updates instead, so now some of the formerly
>> "standalone" documents became child documents, and others, parents.
>>
>> After reindexing:
>>
>> I had double copies of all the documents; two documents with the same
>> (uniqueField) id.  If I re-indexed again, the parent/child copies would be
>> overwritten, but a second "standalone" copy seemed to persist (the
>> _version_ was unchanged).  Is the uniqueId field not being applied to child
>> documents somehow?
>>
>> Pragmatically speaking it seems I just need to wipe the index and start
>> over, but I wonder if that is expected?
>>
>> -Mike
>>
>


Mime
View raw message