metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nickwallen <>
Subject [GitHub] incubator-metron issue #436: METRON-671: Refactor existing Ansible deploymen...
Date Tue, 07 Mar 2017 22:07:41 GMT
Github user nickwallen commented on the issue:
    I have been able to launch "Quick Dev" with deployment report.  Thanks for the fix @dlyle65535

    I have been fighting a bit with the AWS deployment.  I ran into two issues.
    (1)   On one pass the setup of Ambari seems to fail, but the deployment continued, which
causes it to fail later on in the deployment.  To fix, I manually logged into the host and
ran the Ambari setup and then re-ran the deployment which addressed the problem.
    I am almost certain that I have seen this before prior to the work in this PR.  
    $ ./
    TASK [ambari_master : Setup ambari server] *************************************
    "Successfully downloaded JDK distribution to /var/lib/ambari-server/resources/jdk-8u77-linux-x64.tar.gz",
"Installing JDK to /usr/jdk64/", "Successfully installed JDK to /usr/jdk64/", "Downloading
JCE Policy archive from to
/var/lib/ambari-server/resources/", "", "Successfully downloaded JCE Policy
archive to /var/lib/ambari-server/resources/", "Installing JCE policy...",
"Completing setup...", "Configuring database...", "Enter advanced database configuration [y/n]
(n)? ", "Configuring database...", "Default properties detected. Using built-in database.",
"Configuring ambari database...", "Checking PostgreSQL...", "Running initdb: This may take
up to a minute.", "Initializing database: [  OK  ]", "", "About to start PostgreSQL", "Configuring
local database...", "Connecting to local database...connection timed out...retrying (1)",
"Connecting to local database...connection timed out...r
 etrying (2)", "Connecting to local database...unable to connect to database", "ERROR: could
not change directory to \"/home/centos\"", "psql: FATAL:  the database system is starting
up", "", "ERROR: Exiting with exit code 2. ", "REASON: Running database init script failed.
Exiting."], "warnings": []}
    $ ./
    TASK [ambari_config : check if ambari-server is up on]
    fatal: []: FAILED! => {"changed":
false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting for"}
    (2) The second issue was more unexpected.  On all but one of the 10 AWS nodes, the deployment
went smoothly.  At some point during the deployment, Ansible could not talk to one node, but
it continued on anyways.  After the 9 were finished, Ambari showed all 10 nodes, except the
one, which it showed in yellow indicating that it could not get a heartbeat.
    After Ansible was done with the 9 nodes, it then seemed to almost start over on the last
node.  It went and rebuilt the source code, pushed out the RPMs, reinstalled the MPack, etc.
 That really confused the cluster and it has not processed any data.  
    I'm sure a little manual effort could fix-up the cluster, but the behavior of Ansible
was weird.  Before when I've worked with the AWS deployment, it would fail if any one node
failed.  Now it seems to retry failed nodes at a later point in time, which has some negative
implications when we expect actions like the build, mpack install, etc to only occur once.
    Not sure what to make of this issue.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

View raw message