lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-8129) HdfsChaosMonkeyNothingIsSafeTest failures
Date Thu, 21 Jan 2016 18:39:39 GMT

    [ https://issues.apache.org/jira/browse/SOLR-8129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111065#comment-15111065
] 

Yonik Seeley commented on SOLR-8129:
------------------------------------

Here's a visualization of a recent fail:

Node A starts off as the leader, gets a bunch of updates that it ever sends to node B before
it is killed.
Node B becomes the leader.
Node A comes up, does a PeerSync and the lists pretty much overlap in time (looking at low
threshold / high threshold only), so node A asks node B for the docs it's missing (and ends
up with a lot more docs than node B).

The list below is ordered from oldest to newest:
{code}
1523440456046739456    B 
1523440456047788032    B 
1523440456049885184    B 
1523440456050933760    B 
1523440456051982336    B 
1523440456053030912    B 
1523440456053030913    B 
1523440456053030914    B 
1523440456054079488    B 
1523440456055128064    B 
1523440456059322368    B 
1523440456314126336    B 
1523440456314126337    B 
1523440456315174912    B 
1523440456316223488    B 
1523440456318320640    B 
1523440456342437888    B 
1523440456343486464    B 
1523440456343486465    B 
1523440456344535040    B 
1523440456362360832    B 
1523440456363409408  A 
1523440456372846592  A B 
1523440456375992320  A B 
1523440456375992321  A B 
1523440456379138048  A 
1523440456381235200  A 
1523440456382283776  A B 
1523440456392769536  A 
1523440456401158144  A 
1523440456403255296  A B 
1523440456437858304  A 
1523440456463024128  A 
1523440456472461312  A 
1523440456480849920  A 
1523440456531181568  A 
1523440456543764480  A 
1523440456544813056  A 
1523440456544813057  A 
1523440456545861632  A 
1523440456550055936  A B 
1523440456552153088  A B 
1523440456552153089  A 
1523440456559493120  A B 
1523440456561590272  A B 
1523440456561590273  A B 
1523440456562638848  A B 
1523440456563687424  A B 
1523440456565784576  A 
1523440456609824768  A 
1523440456610873344  A 
1523440456610873345  A 
1523440456611921920  A 
1523440456669593600  A 
1523440456669593601  A 
1523440456669593602  A 
1523440456670642176  A 
1523440456671690752  A B 
1523440456672739328  A 
1523440456673787904  A 
1523440456674836480  A 
1523440456675885056  A 
1523440456686370816  A 
1523440456690565120  A 
1523440456702099456  A B 
1523440456726216704  A B 
1523440456772354048  A B 
1523440456785985536  A B 
1523440456826880000  A B 
1523440456857288704  A B 
1523440456858337280  A B 
1523440456921251840  A 
1523440456921251841  A 
1523440456922300416  A 
1523440456926494720  A B 
1523440456926494721  A B 
1523440456927543296  A 
1523440456927543297  A 
1523440456929640448  A B 
1523440456929640449  A 
1523440456934883328  A 
1523440456944320512  A 
1523440456950611968  A 
1523440456975777792  A 
1523440456975777793  A 
1523440456975777794  A 
1523440456976826368  A 
1523440456976826369  A 
1523440456976826370  A 
1523440456999895040  A 
1523440457004089344  A 
1523440457008283648  A 
1523440457009332224  A 
1523440457009332225  A 
1523440457010380800  A 
1523440457056518144  A B 
1523440457064906752  A B 
1523440457065955328  A B 
1523440457067003904  A B 
1523440457070149632  A B 
1523440457071198208  A B 
1523440457071198209  A B 
1523440457074343936  A B 
1523440457077489664  A B 
1523440457078538240  A B 
1523440457079586816  A B 
1523440457080635392  A B 
1523440457116286976  A 
1523440457116286977  A 
1523440457117335552  A 
1523440457138307072  A 
1523440457149841408  A 
1523440457170812928  A 
1523440457172910080  A 
1523440457173958656  A 
1523440457173958657  A 
1523440457175007232  A 
1523440457175007233  A 
1523440457180250112  A 
1523440457181298688  A 
1523440457181298689  A 
1523440460638453760    B 
1523440460641599488    B 
1523440460641599489    B 
1523440460653133824    B 
1523440460708708352    B 
1523440460881723392    B 
1523440460915277824    B 
1523440461056835584    B 
1523440461057884160    B 
1523440461145964544    B 
1523440461206781952    B 
1523440461227753472    B 
1523440461237190656    B 
1523440461259210752    B 
1523440461272842240    B 
1523440461370359808    B 
1523440461379796992    B 
1523440461486751744    B 
1523440461550714880    B 
1523440461615726592    B 
1523440461659766784    B 
1523440461713244160    B 
1523440461754138624    B 
1523440461787693056    B 
1523440461817053184    B 
1523440461862141952    B 
1523440461881016320    B 
1523440461917716480    B 
1523440461939736576    B 
1523440461953368064    B 
1523440461987971072    B 
1523440462001602560    B 
1523440462224949248    B 
1523440462292058112    B 
1523440462313029632    B 
1523440462325612544    B 
1523440462379089920    B 
1523440462421032960    B 
1523440462461927424    B 
1523440462486044672    B 
1523440462501773312    B 
1523440462545813504    B 
1523440474431422464    B
{code}

Massive reorders, which PeerSync was not designed for.
Possible remedies:
 - greatly lower the probability of these big reorders
 - where there is overlap in versions, make PeerSync check that it is "dense" (both shards
have all docs in the overlap)
   -- this seems extremely strict and could cause peersync to fail due to a missing doc right
at the end of an overlap... *which* end matters a lot.
 - expand PeerSync to cover complete index
   -- use hashes over *all* versions in the index


> HdfsChaosMonkeyNothingIsSafeTest failures
> -----------------------------------------
>
>                 Key: SOLR-8129
>                 URL: https://issues.apache.org/jira/browse/SOLR-8129
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>         Attachments: fail.151005_064958, fail.151005_080319
>
>
> New HDFS chaos test in SOLR-8123 hits a number of types of failures, including shard
inconsistency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message