ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hurley (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AMBARI-13974) Retreiving Failed Service Checks Takes Too Long On Large Clusters
Date Thu, 19 Nov 2015 17:52:11 GMT
Jonathan Hurley created AMBARI-13974:
----------------------------------------

             Summary: Retreiving Failed Service Checks Takes Too Long On Large Clusters
                 Key: AMBARI-13974
                 URL: https://issues.apache.org/jira/browse/AMBARI-13974
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.0.0
            Reporter: Jonathan Hurley
            Assignee: Jonathan Hurley
            Priority: Critical
             Fix For: 2.1.3


*STR:*
* Launch Rolling Upgrade on big cluster (500+ node)
* Proceed to Finalize step

*Actual Result:*
Call: 

{code}
/api/v1/clusters/c500/upgrades/69/upgrade_groups?upgrade_items/UpgradeItem/status=COMPLETED&upgrade_items/tasks/Tasks/status.in(FAILED,ABORTED,TIMEDOUT)&upgrade_items/tasks/Tasks/command=SERVICE_CHECK&fields=upgrade_items/tasks/Tasks/command_detail,upgrade_items/tasks/Tasks/status&minimal_response=true
{code}

This call fails due to timeout. No failed Service Checks shown to user.

The root of the problem is how the REST API handles subqueries. For every group that matches,
it will attempt to retrieve every stage and every task and then produce a slice of results
from in-memory comparison.

This should really go through the JPA layer since it's simple comparisons on DB fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message