lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anirudha Jadhav <aniru...@nyu.edu>
Subject Re: index merge question
Date Tue, 11 Jun 2013 14:46:44 GMT
>From my experience the lucene mergeTool and the one invoked by
coreAdmin is a pure lucene implementation and does not understand the
concepts of a unique Key(solr land concept)

  http://wiki.apache.org/solr/MergingSolrIndexes has a cautionary note
at the end

we do frequent index merges for which we externally run map/reduce (
java code using lucene api's) jobs to merge & validate merged indices
with sources.
-Ani

On Tue, Jun 11, 2013 at 10:38 AM, Mark Miller <markrmiller@gmail.com> wrote:
> Yeah, you have to carefully manage things if you are map/reduce building indexes *and*
updating documents in other ways.
>
> If your 'source' data for MR index building is the 'truth', you also have the option
of not doing incremental index merging, and you could simply rebuild the whole thing every
time - of course, depending your cluster size, that could be quite expensive.
>
> - Mark
>
> On Jun 10, 2013, at 8:36 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>
>> Thanks Mark.  My question is stemming from the new cloudera search stuff.
>> My concern its that if while rebuilding the index someone updates a doc
>> that update could be lost from a solr perspective.  I guess what would need
>> to happen to ensure the correct information was indexed would be to record
>> the start time and reindex the information that changed since then?
>> On Jun 8, 2013 2:37 PM, "Mark Miller" <markrmiller@gmail.com> wrote:
>>
>>>
>>> On Jun 8, 2013, at 12:52 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>>>
>>>> When merging through the core admin (
>>>> http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy for
>>>> conflicts during the merge?  So for instance if I am merging core 1 and
>>>> core 2 into core 0 (first example), what happens if core 1 and core 2
>>> both
>>>> have a document with the same key, say core 1 has a newer version of core
>>>> 2?  Does the merge fail, does the newer document remain?
>>>
>>> You end up with both documents, both with that ID - not generally a
>>> situation you want to end up in. You need to ensure unique id's in the
>>> input data or replace the index rather than merging into it.
>>>
>>>>
>>>> Also if using the srcCore method if a document with key 1 is written
>>> while
>>>> an index also with key 1 is being merged what happens?
>>>
>>> It depends on the order I think - if the doc is written after the merge
>>> and it's an update, it will update the doc that was just merged in. If the
>>> merge comes second, you have the doc twice and it's a problem.
>>>
>>> - Mark
>



-- 
Anirudha P. Jadhav

Mime
View raw message