lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Cline <aaron.cl...@gmail.com>
Subject Re: AutoScaling Solr on AWS
Date Thu, 03 Jan 2019 21:00:30 GMT
I thought I'd try to add some more information here.

1.  I have setup TLS for Solr and it seems to be working fine
2.  I have setup Basic Auth for Solr which also seems to be working fine
3.  I have setup ACLs for the Solr configs in Zookeepers which also seems
to be working as expected.

We have 10 or so collections that each have 5 shards and a
replicationfactor of 2.  When a new node comes up, I would just like Solr
to balance all of the shards and i would expect some number of shards to be
migrated to the new node.  I started with 2 nodes and built my 10
collections.  We then added data to the collections.  Success!

Its when the 3rd node spins up that I'm experience the unexpected results.
As you can see from this diagnostic, it is not taking any shards:

/api/cluster/autoscaling/diagnostics

{
  "responseHeader": {
    "status": 0,
    "QTime": 64
  },
  "diagnostics": {
    "sortedNodes": [
      {
        "node": "ip-10-228-2-33.local:8983_solr",
        "cores": 50,
        "freedisk": 14.302078247070312,
        "sysLoadAvg": 56.99999999999999
      },
      {
        "node": "ip-10-228-12-123.local:8983_solr",
        "cores": 50,
        "freedisk": 14.298782348632812,
        "sysLoadAvg": 2
      },
      {
        "node": "ip-10-228-7-27.local:8983_solr",
        "cores": 0,
        "freedisk": 14.729938507080078,
        "sysLoadAvg": 0
      }
    ],
    "violations": []
  },
  "WARNING": "This response format is experimental.  It is likely to change
in the future."
}

It looks like other people on this mailing have had similar issues, but no
one seems to get the solr error that I do which I posted in the first email.

Thanks.

Aaron


On Thu, Jan 3, 2019 at 11:46 AM Aaron Cline <aaron.cline@gmail.com> wrote:

> Solr Version 7.3.1
> Java Version 1.8.0_151
>
> I'm trying to get solrcloud to autoscale when a new node is added to the
> cluster and balance the existing replicas across the new node accordingly.
> I'm running into some kind of odd error during the compute_plan action.
> I'm hoping someone here will point me in the right direction.  Please let
> me know if I need to provide more information.
>
> Here is the log of the error from the solr leader:
>
> 2019-01-03 17:23:10.268 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-customers1&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node19
> 2019-01-03 17:23:10.276 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-customers1&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node9
> 2019-01-03 17:23:10.283 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-customers1&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node13
> 2019-01-03 17:23:10.292 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-customers1&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node3
> 2019-01-03 17:23:10.301 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-gsr-content0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node7
> 2019-01-03 17:23:10.309 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-customers1&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node17
> 2019-01-03 17:23:10.318 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-fulfillment-orders1&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node9
> 2019-01-03 17:23:10.329 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-gsr-content0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node11
> 2019-01-03 17:23:10.337 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-gsr-content0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node3
> 2019-01-03 17:23:10.345 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-fulfillment-orders1&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node13
> 2019-01-03 17:23:10.353 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-fulfillment-orders1&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node17
> 2019-01-03 17:23:10.360 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-orders0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node13
> 2019-01-03 17:23:10.367 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-orders0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node20
> 2019-01-03 17:23:10.375 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-fulfillment-orders1&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node20
> 2019-01-03 17:23:10.382 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-fulfillment-orders1&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node5
> 2019-01-03 17:23:10.389 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-orders0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node5
> 2019-01-03 17:23:10.396 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-customers0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node9
> 2019-01-03 17:23:10.403 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-gsr-content0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node17
> 2019-01-03 17:23:10.411 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-gsr-content0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node20
> 2019-01-03 17:23:10.418 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-customers0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node13
> 2019-01-03 17:23:10.424 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-customers0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node17
> 2019-01-03 17:23:10.431 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-fulfillment-orders0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node9
> 2019-01-03 17:23:10.438 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-fulfillment-orders0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node13
> 2019-01-03 17:23:10.445 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-customers0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node20
> 2019-01-03 17:23:10.452 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-customers0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node5
> 2019-01-03 17:23:10.458 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-fulfillment-orders0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node17
> 2019-01-03 17:23:10.466 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ComputePlanAction Computed Plan:
> action=MOVEREPLICA&collection=blc-fulfillment-orders0&targetNode=ip-10-228-7-27.local:8983_solr&inPlaceMove=true&replica=core_node20
> 2019-01-03 17:23:10.466 INFO
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.s.c.a.PolicyHelper returnSession, curr-time 1861149605
> sessionWrapper.createTime 1861149397601432, this.sessionWrapper.createTime
> 1861149397601432
> 2019-01-03 17:23:10.466 WARN
> (AutoscalingActionExecutor-7-thread-1-processing-n:ip-10-228-12-123.local:8983_solr)
> [   ] o.a.s.c.a.ScheduledTriggers Exception executing actions
> java.lang.Exception: Error executing action: compute_plan for trigger
> event: {
>   "id":"69ca688d47f24Tbojh9oy9a8xwifvk2wfynxs5k",
>   "source":"node_added_trigger",
>   "eventTime":1861088934395684,
>   "eventType":"NODEADDED",
>   "properties":{
>     "eventTimes":[1861088934395684],
>     "_enqueue_time_":1861148945631973,
>     "nodeNames":["ip-10-228-7-27.local:8983_solr"]}}
>         at
> org.apache.solr.cloud.autoscaling.ScheduledTriggers.lambda$null$3(ScheduledTriggers.java:307)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.solr.common.SolrException: Unexpected exception
> while processing event: {
>   "id":"69ca688d47f24Tbojh9oy9a8xwifvk2wfynxs5k",
>   "source":"node_added_trigger",
>   "eventTime":1861088934395684,
>   "eventType":"NODEADDED",
>   "properties":{
>     "eventTimes":[1861088934395684],
>     "_enqueue_time_":1861148945631973,
>     "nodeNames":["ip-10-228-7-27.local:8983_solr"]}}
>         at
> org.apache.solr.cloud.autoscaling.ComputePlanAction.process(ComputePlanAction.java:144)
>         at
> org.apache.solr.cloud.autoscaling.ScheduledTriggers.lambda$null$3(ScheduledTriggers.java:304)
>         ... 6 more
> Caused by: java.lang.IllegalArgumentException: Comparison method violates
> its general contract!
>         at java.util.TimSort.mergeLo(TimSort.java:777)
>         at java.util.TimSort.mergeAt(TimSort.java:514)
>         at java.util.TimSort.mergeCollapse(TimSort.java:441)
>         at java.util.TimSort.sort(TimSort.java:245)
>         at java.util.Arrays.sort(Arrays.java:1512)
>         at java.util.ArrayList.sort(ArrayList.java:1460)
>         at
> org.apache.solr.client.solrj.cloud.autoscaling.MoveReplicaSuggester.tryEachNode(MoveReplicaSuggester.java:46)
>         at
> org.apache.solr.client.solrj.cloud.autoscaling.MoveReplicaSuggester.init(MoveReplicaSuggester.java:34)
>         at
> org.apache.solr.client.solrj.cloud.autoscaling.Suggester.getSuggestion(Suggester.java:129)
>         at
> org.apache.solr.cloud.autoscaling.ComputePlanAction.process(ComputePlanAction.java:98)
>         ... 7 more
>
>
>
> Here's the config from the /api/cluster/autoscaling endpoint:
>
> {
>   "responseHeader": {
>     "status": 0,
>     "QTime": 0
>   },
>   "cluster-preferences": [
>     {
>       "minimize": "cores",
>       "precision": 1
>     },
>     {
>       "maximize": "freedisk"
>     }
>   ],
>   "triggers": {
>     ".auto_add_replicas": {
>       "name": ".auto_add_replicas",
>       "event": "nodeLost",
>       "waitFor": 120,
>       "actions": [
>         {
>           "name": "auto_add_replicas_plan",
>           "class": "solr.AutoAddReplicasPlanAction"
>         },
>         {
>           "name": "execute_plan",
>           "class": "solr.ExecutePlanAction"
>         }
>       ],
>       "enabled": true
>     },
>     "node_added_trigger": {
>       "event": "nodeAdded",
>       "waitFor": 60,
>       "actions": [
>         {
>           "name": "compute_plan",
>           "class": "solr.ComputePlanAction"
>         },
>         {
>           "name": "execute_plan",
>           "class": "solr.ExecutePlanAction"
>         }
>       ],
>       "enabled": true
>     }
>   },
>   "listeners": {
>     ".auto_add_replicas.system": {
>       "trigger": ".auto_add_replicas",
>       "afterAction": [],
>       "stage": [
>         "STARTED",
>         "ABORTED",
>         "SUCCEEDED",
>         "FAILED",
>         "BEFORE_ACTION",
>         "AFTER_ACTION",
>         "IGNORED"
>       ],
>       "class": "org.apache.solr.cloud.autoscaling.SystemLogListener",
>       "beforeAction": []
>     },
>     "node_added_trigger.system": {
>       "trigger": "node_added_trigger",
>       "afterAction": [],
>       "stage": [
>         "STARTED",
>         "ABORTED",
>         "SUCCEEDED",
>         "FAILED",
>         "BEFORE_ACTION",
>         "AFTER_ACTION",
>         "IGNORED"
>       ],
>       "class": "org.apache.solr.cloud.autoscaling.SystemLogListener",
>       "beforeAction": []
>     }
>   },
>   "properties": {},
>   "WARNING": "This response format is experimental.  It is likely to
> change in the future."
> }
>
> All I've added beyond the auto_add_replica settings is the:
>     "node_added_trigger": {
>       "event": "nodeAdded",
>       "waitFor": 60,
>       "actions": [
>         {
>           "name": "compute_plan",
>           "class": "solr.ComputePlanAction"
>         },
>         {
>           "name": "execute_plan",
>           "class": "solr.ExecutePlanAction"
>         }
>       ],
>       "enabled": true
>     }
>
> section, so I'm guessing I'm missing something else perhaps, but I'm not
> sure what that might be.
>
> Thanks for any help.
>
> Aaron
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message